Proofreading Guidelines
Check out the Proofreading Quiz and Tutorial!
Contents
- 1 The Primary Rule
- 2 Summary Guidelines
- 3 About This Document
- 4 Project Comments
- 5 Forum/Discuss This Project
- 6 Fixing Errors on Previous Pages
- 7 Proofreading at the Character Level:
- 7.1 Double Quotes
- 7.2 Single Quotes
- 7.3 Quote Marks on Each Line
- 7.4 End-of-sentence Periods
- 7.5 Punctuation Spacing
- 7.6 Extra Spaces or Tabs Between Words
- 7.7 Trailing Space at End-of-line
- 7.8 Dashes, Hyphens, and Minus Signs
- 7.9 End-of-line Hyphenation and Dashes
- 7.10 End-of-page Hyphenation and Dashes
- 7.11 Period Pause "..." (Ellipsis)
- 7.12 Contractions
- 7.13 Fractions
- 7.14 Accented, Diacritical, and Non-ASCII Characters
- 7.15 Characters from Non-Latin Scripts
- 7.16 Superscripts
- 7.17 Subscripts
- 7.18 Large, Ornate Opening Capital Letter (Drop Cap)
- 7.19 Words in Small Capitals
- 8 Proofreading at the Paragraph Level:
- 9 Proofreading at the Page Level:
- 10 Common Problems:
- 10.1 Formatting
- 10.2 Common OCR Problems
- 10.3 OCR Problems: Scannos
- 10.4 OCR Problems: Is that ° º really a degree sign?
- 10.5 Handwritten Notes in Book
- 10.6 Bad Image
- 10.7 Wrong Image for Text
- 10.8 Previous Proofreader Mistakes
- 10.9 Printer Errors/Misspellings
- 10.10 Factual Errors in Texts
- 10.11 Inserting Special Characters
- 11 Alphabetical Index to the Guidelines
The Primary Rule
"Don't change what the author wrote!"
The final electronic book seen by a reader, possibly many years in the future, should accurately convey the intent of the author. If the author spelled words oddly, we leave them spelled that way. If the author wrote outrageous racist or biased statements, we leave them that way. If the author put commas, superscripts, or footnotes every third word, we keep the commas, superscripts, or footnotes. We are proofreaders, not editors; if something in the text does not match the original page image, you should change the text so that it does match. (See Printer's Errors for proper handling of obvious misprints.)
We do change minor typographical conventions that don't affect the sense of what the author wrote. For example, we rejoin words that were broken at the end of a line (End-of-line Hyphenation). Changes such as these help us produce a consistently formed version of the book. The proofreading rules we follow are designed to achieve this result. Please carefully read the rest of the Proofreading Guidelines with this concept in mind. These guidelines are intended for proofreading only. As a proofreader you are matching the image's content while later the formatters will match the image's look.
To assist the next proofreader, the formatters, and the post-processor, we also preserve line breaks. This allows them to easily compare the lines in the text to the lines in the image.
Summary Guidelines
The Proofreading Summary is a short, 2-page printer-friendly (.pdf) document that summarizes the main points of these Guidelines and gives examples of how to proofread. Beginning proofreaders are encouraged to print out this document and keep it handy while proofreading.
You may need to download and install a .pdf reader. You can get one free from Adobe® here.
About This Document
This document is written to explain the proofreading rules we use to maintain consistency when proofreading a single book that is distributed among many proofreaders, each of whom is working on different pages. This helps us all do proofreading the same way, which in turn makes it easier for the formatters and for the post-processor who will complete the work on this e-book.
It is not intended as any kind of a general editorial or typesetting rulebook.
We've included in these proofreading guidelines all the items that new users have asked about while proofreading. There is a separate set of Formatting Guidelines. A second group of volunteers will be working on the formatting of the text. If you come across a situation and you do not find a reference in these guidelines, it is likely that it will be handled in the formatting rounds and so is not mentioned here. If you aren't sure, please ask about it in the Project Discussion.
If there are any items missing, or items that you consider should be done differently, or if something is vague, please let us know. If you come across an unfamiliar term in these guidelines, see the wiki jargon guide. This document is a work in progress. Help us to improve it by posting your suggested changes in the Documentation Forum in this thread.
Project Comments
When you select a project for proofreading, the Project Page is loaded. On this page there is a section called "Project Comments" containing information specific to that project (book). Read these before you start proofreading pages! If the Project Manager wants you to do something in this book differently from the way specified in these Guidelines, that will be noted here. Instructions in the Project Comments override the rules in these Guidelines, so follow them. There may also be instructions in the project comments that apply to the formatting phase, which do not apply during proofreading. Finally, this is also where the Project Manager may give you interesting tidbits of information about the author or the project.
Please also read the Project Thread (discussion): The Project Manager may clarify project-specific guidelines here, and it is often used by proofreaders to alert other proofreaders to recurring issues within the project and how they can best be addressed. (See below.)
On the Project Page, the link 'Images, Pages Proofread, & Differences' allows you to see how other proofreaders have made changes. This forum thread discusses different ways to use this information.
Forum/Discuss This Project
On the Project Page where you start proofreading pages, on the line "Forum", there is a link titled "Discuss this Project" (if the discussion has already started), or "Start a discussion on this Project" (if it hasn't). Clicking on that link will take you to a thread in the projects forum dedicated to this specific project. That is the place to ask questions about this book, inform the Project Manager about problems, etc. Using this project forum thread is the recommended way to communicate with the Project Manager and other proofreaders who are working on this book.
Fixing Errors on Previous Pages
The Project Page contains links to pages from this project that you have recently proofread. (If you haven't proofread any pages yet, no links will be shown.)
Pages listed under either "DONE" or "IN PROGRESS" are available to make proofreading corrections or to finish proofreading. Just click on the link to the page. Thus, if you discover that you made a mistake on a page or marked something incorrectly, you can click on that page here and reopen it to fix the error.
You may also use the "Images, Pages Proofread, & Differences" or "Just My Pages" links on the Project Page. These pages will display an "Edit" link next to the pages you have worked on in the current round that can still be corrected.
For more detailed information, refer to either the Standard Proofreading Interface Help or the Enhanced Proofreading Interface Help, depending on which interface you are using.
Proofreading at the Character Level:
Distributed Proofreaders supports Unicode using UTF-8 encoding. This allows a larger number of characters to be offered for use in our projects. Each of our books (projects) uses one or more sets of Unicode characters known as "character suites". The characters in a project's character suites are included in the character pickers available in that project's proofreading interface.
Double Quotes
Proofread “double quotes” as straight " double quotes rather than as "curly" quotes. Do not change double quotes to single quotes. Leave them as the author wrote them. See Chapter Headings if a double quote is missing at the start of a chapter. Some languages use guillemets «like this» as quotation marks. These are available from the character picker in the proofreading interface.
For quotation marks other than ", «, and », use the same marks that appear in the image if they are available. Remember to remove space between the quotation marks and the quoted text; if needed, it will be added in post-processing. The same applies to languages which use reversed guillemets, »like this«.
If the Project Manager instructs you in the Project Comments to proofread quotation marks differently for a particular book, please follow those instructions but be sure not to apply those directions to other projects.
Single Quotes
Proofread these as the straight ' single quote (apostrophe) rather than as "curly" single quotes. Do not change single quotes to double quotes. Leave them as the author wrote them.
If the Project Manager instructs you in the Project Comments to proofread quotation marks differently for a particular book, please follow those instructions but be sure not to apply those directions to other projects.
Quote Marks on Each Line
Proofread quotation marks at the beginning of each line of a quotation within a single paragraph by removing all of them except for the one at the start of the quotation. (This prevents the quotation marks from ending up in the middle of lines of text when the line-breaks within the paragraph change.) If a quotation like this goes on for multiple paragraphs, leave the quote mark that appears on the first line of each paragraph.
However, in poetry keep the extra quote marks where they appear in the image, since the line breaks will not be changed.
Often there is no closing quotation mark until the very end of the quoted section of text, which may not be on the same page you are proofreading. Leave it that way—do not add closing quotation marks that are not in the page image.
There are some language-specific exceptions. In some languages, for example, dialog within quotations uses a combination of different punctuation to indicate various speakers. If you are not familiar with a particular language, check the Project Comments or leave a message for the Project Manager in the Project Discussion for clarification.
Original Image: |
---|
Clearly he wasn't an academic with a preface like this |
Correctly Proofread Text: |
Clearly he wasn't an academic with a preface like this |
End-of-sentence Periods
Proofread periods that end sentences with a single space after them.
You do not need to remove extra spaces after periods if they're already in the OCR'd text—we can do that automatically during post-processing.
Punctuation Spacing
Spaces before punctuation sometimes appear because books typeset in the 1700's & 1800's often used partial spaces before punctuation such as a semicolon or colon.
In general, a punctuation mark should have a space after it but no space before it. If the OCR'd text has no space after a punctuation mark, add one; if there is a space before punctuation, remove it. This applies even to languages such as French that normally use spaces before punctuation characters. However, punctuation marks that normally appear in pairs, such as "quotation marks", (parentheses), [brackets], and {braces} normally have a space before the opening mark, which should be retained.
Original Image: |
---|
and so it goes ; ever and ever. |
Correctly Proofread Text: |
and so it goes; ever and ever. |
Extra Spaces or Tabs Between Words
Extra spaces between words are common in OCR output. You don't need to bother removing these—that can be done automatically during post-processing. However, extra spaces around punctuation, em-dashes, quote marks, etc. do need to be removed when they separate the symbol from the word.
For example, in A horse ; my kingdom for a horse. the space between the word "horse" and the semicolon should be removed. But the 2 spaces after the semicolon are fine—you don't have to delete one of them.
In addition, if you find any tab characters in the text you should remove them.
Trailing Space at End-of-line
Do not bother inserting spaces at the ends of lines of text; any such spaces will automatically be removed from the text when you save the page. When the text is post-processed, each end-of-line will be converted into a space.
Dashes, Hyphens, and Minus Signs
There are generally four such marks you will see in books:
- Hyphens. These are used to join words together, or sometimes to
join prefixes or suffixes to a word.
Leave these as a single hyphen, with no spaces on either side. Note that there is a common exception to this shown in the second example below. - En-dashes. These are just a little longer, and are used for a
range of numbers, or for a mathematical minus sign.
Proofread these as a single hyphen, too. Spaces before or after are determined by the way it was done in the book; usually no spaces in number ranges, usually spaces around mathematical minus signs, sometimes both sides, sometimes just before. - Em-dashes & long dashes. These serve as separators between
words—sometimes for emphasis like this—or when a speaker gets a word caught in
his throat——!
Proofread these as two hyphens if the dash is as long as 2-3 letters (an em-dash) and four hyphens if the dash is as long as 4-5 letters (a long dash). Unless you are advised to do otherwise by the Project Comments, don't leave a space before or after, even if it looks like there was a space in the original book image. If you enter an em-dash or en-dash directly into the proofreading interface, the system will advise you when you attempt to save your page that it is an invalid character and must be corrected according to these guidelines before you may save your page as done. - Deliberately Omitted or Censored Words or Names.
If represented by a dash in the image, proofread these as two hyphens or four hyphens as described for em-dashes & long dashes. When it represents a word, we leave appropriate space around it like it's really a word. If it's only part of a word, then no spaces—join it with the rest of the word.
See also the guidelines for end-of-line and end-of-page hyphens and dashes.
Examples—Dashes, Hyphens, and Minus Signs:
End-of-line Hyphenation and Dashes
Where a hyphen appears at the end of a line, join the two halves of the hyphenated word back together. Remove the hyphen when you join it, unless it is really a hyphenated word like well-meaning. See Dashes, Hyphens, and Minus Signs for examples of each kind. Keep the joined word on the top line, and put a line break after it to preserve the line formatting—this makes it easier for volunteers in later rounds. If the word is followed by punctuation, then carry that punctuation onto the top line, too.
Words like to-day and to-morrow that we don't commonly hyphenate now were often hyphenated in the old books we are working on. Leave them hyphenated the way the author did. If you're not sure if the author hyphenated it or not, leave the hyphen, put an * after it, and join the word together like this: to-*day. The asterisk will bring it to the attention of the post-processor, who has access to all the pages and can determine how the author typically wrote this word.
Similarly, if an em-dash appears at the start or end of a line of your OCR'd text, join it with the other line so that there are no spaces or line breaks around it. However, if the author used an em-dash to start or end a paragraph or a line of poetry, you should leave it as it is, without joining it to the next line. See Dashes, Hyphens, and Minus Signs for examples.
End-of-page Hyphenation and Dashes
Proofread end-of-page hyphens or em-dashes by leaving the hyphen or em-dash at the end of the last line, and mark it with a * after the hyphen or dash. For example:
Original Image: |
---|
something Pat had already become accus- |
Correctly Proofread Text: |
something Pat had already become accus-* |
On pages that start with part of a word from the previous page or an em-dash, place a * before the partial word or em-dash. To continue the above example:
Original Image: |
---|
tomed to from having to do his own family |
Correctly Proofread Text: |
*tomed to from having to do his own family |
These markings indicate to the post-processor that the word must be rejoined when the pages are combined to produce the final e-book. Please do not join the fragments across the pages yourself.
Period Pause "..." (Ellipsis)
The guidelines are different for English and Languages Other Than English.
Projects in English: An ellipsis should have three dots. Regarding the spacing, in the middle of a sentence treat the three dots as a single word (i.e., usually a space before the 3 dots and a space after). At the end of a sentence treat the ellipsis as ending punctuation, with no space before it.
Note that there will also be an ending punctuation mark at the end of a sentence, so in the case of a period there will be 4 dots total. Remove extra dots, if any, or add new ones, if necessary, to bring the number to three (or four) as appropriate. A good hint that you're at the end of a sentence is the use of a capital letter at the start of the next word, or the presence of an ending punctuation mark (e.g., a question mark or exclamation point).
Projects in Languages Other Than English: Use the general rule "Follow closely the style used in the printed page." In particular, insert spaces, if there are spaces before or between the periods, and use the same number of periods as appear in the image. Sometimes the printed page is unclear; in that case, insert a [**unclear] to draw the attention of the post-processor. (Note: Post-processors should replace those regular spaces with non-breaking spaces.)
English project examples:
Original Image: | Correctly Proofread Text: |
---|---|
That I know . . . is true. | That I know ... is true. |
This is the end.... | This is the end.... |
The moving finger writes; and. . . The poet surely had a pen though! |
The moving finger writes; and.... The poet surely had a pen though! |
Wherefore art thou Romeo. . . ? | Wherefore art thou Romeo...? |
“I went to the store, . . .” said Harry. | "I went to the store, ..." said Harry. |
“... And I did too!” said Sally. | "... And I did too!" said Sally. |
“Really? . . . Oh, Harry!” | "Really?... Oh, Harry!" |
Contractions
In English, remove any extra space in contractions. For example, would n't should be proofread as wouldn't and 't is as 'tis.
This was a 19th century printers' convention in which the space was retained to indicate that 'would' and 'not' were originally separate words. It is also sometimes an artifact of the OCR. Remove the extra space in either case.
Some Project Managers may specify in the Project Comments not to remove extra spaces in contractions, particularly in the case of books that contain slang, dialect, or poetry.
Fractions
Unless otherwise directed in the Project Comments, do not use the actual fraction symbols; please proofread fractions as follows: ¼ becomes 1/4, and 2½ becomes 2-1/2. The hyphen prevents the whole and fractional part from becoming separated when the lines are rewrapped during post-processing.
Accented, Diacritical, and Non-ASCII Characters
If the characters are not in that project's character suite(s) as shown by the character picker in the proofreading interface, they are not valid for the project. If they are valid, you may enter them via keyboard if that's an option for you, the character picker in the proofreading interface, or using the method outlined in Inserting Special Characters.
If there are characters that are in the page scans for which there are no Character Picker choices, and the Project Manager has not given specific instructions in the Project Comments, please ask in the Project Discussion or contact the Project Manager by private message.
Characters with Diacritical Marks
In some projects, you will find characters with special marks either above or below the basic Latin A...Z letter. These are called diacritical marks, and indicate a special pronunciation for this character. If these characters are not in that project's character suite(s) as displayed in the character picker in the proofreading interface, please indicate them in the text by using a specific coding, such as ă which is represented as: [)a] for a breve (the u-shaped accent) above an a, or [a)] for a breve below. Be sure to include the square brackets ([ ]). In the rare case in which a diacritic is over two letters, include both letters in the brackets.
When you type the bracket notation for a diacritical character in the proofreading interface, the system will convert your bracket notation into the actual character, provided that diacritical character is available in the character suite for that project. For example, if you type [:o] in the proofreading interface, the entire [:o] will turn into ö when you type the closing square bracket ].
In the final processing of the project, the post-processor will replace any remaining "square bracketed" diacritical characters that remain in the text with whatever symbol works best for the final version of the text submitted to Project Gutenberg.
In the table below, the "x" represents a letter with a diacritical mark. When proofreading, use the actual character from the text, not the x shown in the examples.
Proofreading Symbols for Diacritical Marks | |||
---|---|---|---|
diacritical mark | sample | above | below |
macron (straight line) | ¯ | [=x] | [x=] |
2 dots (dieresis, umlaut) | ¨ | [:x] | [x:] |
1 dot | · | [.x] | [x.] |
grave accent | ` | [`x] | [x`] |
acute accent (aigu) | ´ | ['x] | [x'] |
circumflex | ˆ | [^x] | [x^] |
caron (v-shaped symbol) | ∨ | [vx] | [xv] |
breve (u-shaped symbol) | ∪ | [)x] | [x)] |
inverted breve (inverted u-shaped symbol) | ̑ | [(x] | [x(] |
tilde | ˜ | [~x] | [x~] |
cedilla | ¸ | [x,] | |
ring | ̊ | [*x] | [x*] |
Characters from Non-Latin Scripts
Some projects contain characters from non-Latin scripts such as Greek, Cyrillic, Hebrew, or Arabic. If these are not included in the character picker for that project and the Project Manager has not provided special instructions in the Project Comments, please ask in the Project Discussion or contact the Project Manager by Private Message.
For Greek characters, Project Managers may ask you to transliterate that text. Transliteration involves converting each character of the foreign text into the equivalent Latin letter(s). For proofreaders unfamiliar with the Greek alphabet, a Greek transliteration tool is provided in the proofreading interface.
Press the "Greek Transliterator" button near the bottom of the proofreading interface to open the tool. In the tool, click on the Greek characters that match the word or phrase you are transliterating, and the appropriate basic Latin characters will appear in the text box. When you are done, simply cut and paste this transliterated text into the page you are proofreading. Surround the transliterated text with the Greek markers [Greek: and ]. (To generate a Greek marker, it's easiest to click on the "[Greek:]" button in your proofreading interface.) For example, Βιβλος would become [Greek: Biblos]. ("Book"—so appropriate for DP!)
If you are uncertain about your transliteration, mark it with ** to bring it to the attention of the next proofreader or the post-processor.
For other alphabets that cannot be so easily transliterated, such as Cyrillic, Hebrew, or Arabic, the Project Manager may ask you to replace the non-Latin characters with the appropriate mark: [Cyrillic: **], [Hebrew: **], or [Arabic: **]. Include the ** so the post-processor can address it later.
- Greek: See the Transliterating Greek wiki page, from Project Gutenberg, or the "Greek Transliterator" pop-up tool in the proofreading interface.
- Cyrillic: While a standard transliteration scheme exists for Cyrillic, we only recommend you attempt a transliteration if you are fluent in a language that uses it. Otherwise, just mark it as indicated above.
- Hebrew and Arabic: Not recommended unless you are fluent. There are significant difficulties transliterating these languages and neither Distributed Proofreaders nor Project Gutenberg have yet chosen a standard method.
Superscripts
Older books often abbreviated words as contractions, and printed them as superscripts. Proofread these by inserting a single caret (^) followed by the superscripted text. If the superscript continues for more than one character, then surround the text with curly braces { and } as well. For example:
Original Image: |
---|
Genrl Washington defeated Ld Cornwallis's army. |
Correctly Proofread Text: |
Gen^{rl} Washington defeated L^d Cornwallis's army. |
If the superscript represents a footnote marker, then see the Footnotes section instead.
The Project Manager may specify in the Project Comments that superscripted text be marked differently.
Subscripts
Subscripted text is often found in scientific works, but is not common in other material. Proofread subscripted text by inserting an underline character _ and surrounding the text with curly braces { and }. For example:
Original Image: |
---|
H2O. |
Correctly Proofread Text: |
H_{2}O. |
Large, Ornate Opening Capital Letter (Drop Cap)
Proofread a large and ornate graphic first letter of a chapter, section, or paragraph as if it were an ordinary letter. See also the Chapter Headings section of the Proofreading Guidelines.
Words in Small Capitals
Please proofread only the characters in Small Caps (capital letters which are smaller than the standard capitals). Do not worry about case changes. If the OCR'd text is already ALL-CAPPED, Mixed-Cased, or lower-cased, leave it ALL-CAPPED, Mixed-Cased, or lower-cased. Small caps may occasionally appear with <sc> and </sc> around it; see Formatting in that case.
Proofreading at the Paragraph Level:
Line Breaks
Leave all line breaks in so that later in the process other volunteers can easily compare the lines in the text to the lines in the image. Be especially careful about this when rejoining hyphenated words or moving words around em-dashes. If the previous proofreader removed the line breaks, please replace them so that they once again match the image.
Chapter Headings
Proofread chapter headings as they appear in the image.
A chapter heading may start a bit farther down the page than the page header and won't have a page number on the same line. Chapter Headings are often printed all caps; if so, keep them as all caps.
Watch out for a missing double quote at the start of the first paragraph, which some publishers did not include or which the OCR missed due to a large capital in the image. If the author started the paragraph with dialog, insert the double quote.
Paragraph Spacing/Indenting
Put a blank line before the start of a paragraph, even if it starts at the top of a page. You should not indent the start of the paragraph, but if it is already indented don't bother removing those spaces—that can be done automatically during post-processing.
See the Sidenotes image/text for an example.
Remove page headers and page footers, but not footnotes, from the text.
The page headers are normally at the top of the image and have a page number opposite them. Page headers may be the same all through the book (often the title of the book and the author's name), they may be the same for each chapter (often the chapter number), or they may be different on each page (describing the action on that page). Remove them all, regardless, including the page number. Extra blank lines should be removed except where we intentionally add them for proofreading. But blank lines at the bottom of the page are fine—these are removed when you save the page.
Page footers are at the bottom of the image and may contain a page number or other extraneous marks that are not part of what the author wrote.
A chapter heading will usually start further down the page and won't have a page number on the same line. See the example below.
Illustrations
Ignore illustrations, but proofread any caption text as it is printed, preserving the line breaks. If the caption falls in the middle of a paragraph, use blank lines to set it apart from the rest of the text. Text that could be (part of) a caption should be included, such as "See page 66" or a title within the bounds of the illustration.
Most pages with an illustration but no text will already be marked with [Blank Page]. Leave this marking as is.
Original Image: |
---|
![]() |
Correctly Proofread Text: |
Martha told him that he had always been her ideal and |
Footnotes/Endnotes
Proofread footnotes by leaving the text of the footnote at the bottom of the page and placing a tag where it is referenced in the text.
In the main text, the character that marks a footnote location should be surrounded with square brackets ([ and ]) and placed right next to the word being footnoted[1] or its punctuation mark,[2] as shown in the image and the two examples in this sentence. Footnote markers may be numbers, letters, or symbols. When footnotes are marked with a symbol or a series of symbols (*, †, ‡, §, etc.) we replace them all with [*] in the text, and * next to the footnote itself.
At the bottom of the page, proofread the footnote text as it is printed, preserving the line breaks. Be sure to use the same tag before the footnote as you used in the text where the footnote was referenced. Use just the character itself for the tag, without any brackets or other punctuation.
Place each footnote on a separate line in order of appearance, with a blank line before each one.
Do not include any horizontal lines separating the footnotes from the main text.
Endnotes are just footnotes that have been located together at the end of a chapter or at the end of the book, instead of on the bottom of each page. These are proofread in the same manner as footnotes. Where you find an endnote reference in the text, just surround it with [ and ]. If you are proofreading one of the pages with endnotes, put a blank line before each endnote so that it is clear where each begins and ends.
Footnotes in Tables should remain where they are in the original image.
Original Image: |
---|
The principal persons involved in this argument were Caesar*, former military * Gaius Julius Caesar. |
Correctly Proofread Text: |
The principal persons involved in this argument were Caesar[*], former military |
Original Footnoted Poetry: |
---|
Mary had a little lamb1 1 This lamb was obviously of the Hampshire breed,
well known for the pure whiteness of their wool. |
Correctly Proofread Text: |
Mary had a little lamb[1] |
Paragraph Side-Descriptions (Sidenotes)
Some books will have short descriptions of the paragraph along the side of the text. These are called sidenotes. Proofread the sidenote text as it is printed, preserving the line breaks (while handling end-of-line hyphenation and dashes normally). Leave a blank line before and after the sidenote so that it can be distinguished from the text around it. The OCR may place the sidenotes anywhere on the page, and may even intermingle the sidenote text with the rest of the text. Separate them so that the sidenote text is all together, but don't worry about the position of the sidenotes on the page.
Multiple Columns
Proofread ordinary text that has been printed in multiple columns as a single column. Place the text from the left-most column first, the text from the next column below that, and so on. Do not mark where the columns were split, just join them together. See the very bottom of the Sidenotes example for an example of multiple columns.
See also the Index and Table sections of the Proofreading Guidelines.
Tables
A proofreader's job is to be sure that all the information in a table is correctly proofread. Separate items with spaces as needed, but do not worry about precise alignment. Retain line breaks (while handling end-of-line hyphenation and dashes normally). Ignore any periods or other punctuation (leaders) used to align the items.
Footnotes in tables should remain where they are in the image. See footnotes for details.
Poetry/Epigrams
Insert a blank line at the start of the poetry or epigram and another blank line at the end, so that the formatters can clearly see the beginning and end. Leave each line left justified and maintain the line breaks. Insert a blank line between stanzas, when there is one in the image.
Line Numbers in poetry should be kept.
Check the Project Comments for the specific project you are proofreading.
Line Numbers
Line numbers are common in books of poetry, and usually appear near the margin every fifth or tenth line. Keep line numbers, using a few spaces to separate them from the other text on the line so that the formatters can easily find them. Since poetry will not be rewrapped in the e-book version, the line numbers will be useful to readers.
Single Word at Bottom of Page
Proofread this by deleting the word, even if it's the second half of a hyphenated word.
In some older books, the single word at the bottom of the page (called a "catchword", usually printed near the right margin) indicates the first word on the next page of the book (called an "incipit"). It was used to alert the printer to print the correct reverse (called "verso"), to make it easier for printers' helpers to make up the pages prior to binding, and to help the reader avoid turning over more than one page.
Proofreading at the Page Level:
Blank Page
Most blank pages, or pages with an illustration but no text, will already be marked with [Blank Page]. Leave this marking as is. If the page is blank, and [Blank Page] does not appear, there is no need to add it.
If there is text in the proofreading text area and a blank image, or if there is text in the image but none in the text box, follow the directions for a Bad Image or Bad Text.
Front/Back Title Page
Proofread all the text just as it was printed on the page, whether all capitals, upper and lower case, etc., including the years of publication or copyright.
Older books often show the first letter as a large ornate graphic—proofread this as just the letter.
Table of Contents
Proofread the Table of Contents just as it is printed in the book, whether all capitals, upper and lower case, etc. If there are Small Capitals, see the guidelines for Small Capitals.
Ignore any periods or other punctuation (leaders) used to align the page numbers. These will be removed later in the process.
Indexes
You don't need to align the page numbers in index pages as they appear in the image; just make sure that the numbers and punctuation match the image and retain the line breaks.
Specific formatting of indexes will occur later in the process. The proofreader's job is to make sure that all the text and numbers are correct.
See also Multiple Columns.
Plays: Actor Names/Stage Directions
In dialog, treat a change in speaker as a new paragraph, with one blank line before it. If the speaker's name is on its own line, treat that as a separate paragraph as well.
Stage directions are kept as they are in the original image, so if the stage direction is on a line by itself, proofread it that way; if it is at the end of a line of dialog, leave it there. Stage directions often begin with an opening bracket and omit the closing bracket. This convention is retained; do not close the brackets.
Sometimes, especially in metrical plays, a word is split due to page-size constraints and placed above or below following a (, rather than having a line of its own. Please rejoin the word as per normal end-of-line hyphenation. See the example.
Please check the Project Comments, as the Project Manager may specify different handling.
Anything else that needs special handling or that you're unsure of
While proofreading, if you encounter something that isn't covered in these guidelines that you think needs special handling or that you are not sure how to handle, post your question, noting the png (page) number, in the Project Discussion.
You should also put a note in the proofread text to explain to the next proofreader, formatter, or post-processor what the problem or question is. Start your note with a square bracket and two asterisks [** and end it with another square bracket ]. This clearly separates it from the author's text and signals the post-processor to stop and carefully examine this part of the text and the matching image to address any issues. You may also want to identify which round you are working in just before the ] so that later volunteers know who left the note. Any comments put in by a previous volunteer must be left in place. See the next section for details.
Previous Proofreaders' Notes/Comments
Any notes or comments put in by a previous volunteer must be left in place. You may add agreement or disagreement to the existing note but even if you know the answer, you absolutely must not remove the comment. If you have found a source which clarifies the problem, please cite it so the post-processor can also refer to it.
If you come across a note from a previous volunteer that you know the answer to, please take a moment and provide feedback to them by clicking on their name in the proofreading interface and posting a private message to them explaining how to handle the situation in the future. Please, as already stated, do not remove the note.
Common Problems:
Formatting
You may sometimes find formatting already present in the text.
Do not add or correct this formatting information; the formatters will do that later in the process.
However, you can remove it if it interferes with your proofreading. The <x> button in the
proofreading interface will remove markup such as <i> and <b> from highlighted text.
Some examples of formatting tasks include:
- <i>italics</i>, <b>bold</b>, <sc>Small Caps</sc>
- Spaced-out text
- Font size changes
- Spacing of chapter and section headings
- Extra spaces, stars, or lines between paragraphs
- Footnotes that continue for more than one page
- Footnotes marked with symbols
- Illustrations
- Sidenote locations
- Arrangement of data in tables
- Indentation (in poetry or elsewhere)
- Rejoining long lines in poetry and indexes
If the previous proofreader inserted formatting, please take a moment and provide feedback to them by clicking on their name in the proofreading interface and posting a private message to them explaining how to handle the situation in the future. Remember to leave the formatting to the Formatting rounds.
Common OCR Problems
OCR commonly has trouble distinguishing between the similar characters. Some examples are:
- The digit '1' (one), the lowercase letter 'l' (ell), and the uppercase letter 'I'. Note that in some fonts the number one may look like I (like a small capital letter 'i').
- The digit '0' (zero), and the uppercase letter 'O'.
- Dashes & hyphens: Proofread these carefully—OCR'd text often has only one hyphen for an em-dash that should have two. See the guidelines for hyphenated words and em-dashes for more detailed information.
- Parentheses ( ) and curly braces { }.
Watch out for these. Normally the context of the sentence is sufficient to determine which is the correct character, but be careful—often your mind will automatically 'correct' these as you are reading.
Noticing these is much easier if you use a mono-spaced font such as DP Sans Mono or Courier.
OCR Problems: Scannos
Another common OCR issue is misrecognition of characters. We call these errors "scannos" (like "typos"). This misrecognition can create a word that:
- appears to be correct at first glance, but is actually misspelled.
This can usually be caught by running WordCheck from the proofreading interface. - is changed to a different but otherwise valid word that does not match what is in the page image.
This is subtle because it can only be caught by someone actually reading the text.
Possibly the most common example of the second type is "and" being OCR'd as "arid." Other examples: "eve" for "eye", "Torn" for "Tom", "train" for "tram". This type is harder to spot and we have a special term for them: "Stealth Scannos." We collect examples of Stealth Scannos in this thread.
Spotting scannos is much easier if you use a mono-spaced font such as DP Sans Mono or Courier. To aid proofreading, the use of WordCheck (or its equivalent) is recommended in P1, and required in the other proofreading rounds.
OCR Problems: Is that ° º really a degree sign?
There are three different symbols that can look very similar in the image and that the OCR software interprets the same (and usually incorrectly):
- The degree sign °: This should be used only to indicate degrees (of temperature, of angle, etc.).
- The superscript o: Virtually all other occurrences of a raised o should be proofread as ^o, following the guidelines for Superscripts.
- The masculine ordinal º: Proofread this like a superscript too unless the special character is requested in the Project Comments. It may be used in languages such as Spanish and Portuguese, and is the equivalent of the -th in English 4th, 5th, etc. It follows numbers and has the feminine equivalent in the superscript a (ª).
Handwritten Notes in Book
Do not include handwritten notes in a book (unless it is overwriting faded, printed text to make it more visible). Do not include handwritten marginal notes made by readers, etc.
Bad Image
If an image is bad (not loading, mostly illegible, etc.), please post about this bad image in the project discussion and click on the "Report Bad Page" button so this page is 'quarantined', rather than returning the page to the round. If only a small portion of the image is bad, leave a note as described above, and please post in the project discussion without marking the whole page bad. The "Bad Page" button is only available during the first round of proofreading, so it is important that these issues be resolved early.
Note that some page images are quite large, and it is common for your browser to have difficulty displaying them, especially if you have several windows open or are using an older computer. Before reporting this as a bad page, try zooming in on the image, closing some of your windows and programs, or posting in the project discussion to see if anyone else has the same problem.
Wrong Image for Text
If there is a wrong image for the text given, please post about this bad page in the project discussion and click on the "Report Bad Page" button so this page is 'quarantined', rather than returning the page to the round. The "Bad Page" button is only available during the first round of proofreading, so it is important that these issues be resolved early.
It's fairly common for the OCR'd text to be mostly correct, but missing the first line or two of the text. Please just type in the missing line(s). If nearly all of the lines are missing in the text box, then either type in the whole page (if you are willing to do that), or just click on the "Return Page to Round" button and the page will be reissued to someone else. If there are several pages like this, you might post a note in the project discussion to notify the Project Manager.
Previous Proofreader Mistakes
If a previous proofreader made a lot of mistakes or missed a lot of things, please take a moment to provide feedback to them by clicking on their name in the proofreading interface and posting a private message to them explaining how to handle the situation so that they will know how in the future.
Please be nice! Everyone here is a volunteer and presumably trying their best. The point of your feedback message should be to inform them of the correct way to proofread, rather than to criticize them. Give a specific example from their work showing what they did, and what they should have done.
If the previous proofreader did an outstanding job, you can also send them a message about that—especially if they were working on a particularly difficult page.
Printer Errors/Misspellings
Correct all of the words that the OCR has misread (scannos), but do not correct what may appear to you to be misspellings or printer errors that occur on the page image. Many of the older texts have words spelled differently from modern usage and we retain these older spellings, including any accented characters.
Place a note in the text next to a printer's erorr[**typo for error?]. If you are unsure whether it is actually an error, please also ask in the project discussion. If you do make a change, include a note describing what you changed: [**typo "erorr" fixed]. Include the two asterisks ** so the post-processor will notice it.
Factual Errors in Texts
Do not correct factual errors in the author's book. Many of the books we are proofreading have statements of fact in them that we no longer accept as accurate. Leave them as the author wrote them. See Printer Errors/Misspellings for how to leave a note if you think the printed text is not what the author intended.
Inserting Special Characters
If they are not on your keyboard, there are several ways to input special characters:
- The character picker in the proofreading interface.
- Bracket notation for characters with diacritical marks described in the Characters with Diacritical Marks section of this document.
- Applets included with your operating system. If you use one of these, be sure to insert only characters that are in the character suite(s) enabled for the project and selectable from the character picker in the proofreading interface; otherwise the proofreading interface will reject them when you try to save as done. The charts below give some common shortcuts.
- Windows: "Character Map"
Access it through:
Start: Run: charmap, or
Start: Accessories: System Tools: Character Map. - Macintosh: Key Caps or "Keyboard Viewer"
For OS 9 and lower this is on the Apple Menu,
For OS X through 10.2, this is located the in Applications, Utilities folder
For OS X 10.3 and higher, this is in the Input Menu as "Keyboard Viewer." - Linux: The name and location of the character picker will vary depending on your desktop environment.
- Keyboard shortcuts.
(See the tables for Windows and Macintosh below.) - An online program.
- Switching to a keyboard layout or locale which supports "deadkey" accents.
- Windows: Control Panel (Keyboard, Input Locales)
- Macintosh: Input Menu (on Menu Bar)
- Linux: Change the keyboard in your X configuration.
- For information on how to enter Greek characters using your keyboard, please read the Typing Greek wiki page.
For Windows:
- You can use the Character Map program (Start: Run: charmap) to select an individual letter, and then cut & paste.
- The character picker in the proofreading interface.
- Or you can type the Alt+NumberPad shortcut codes listed below for these characters.
This is faster than using cut & paste, once you get used to the codes.
Hold the Alt key and type the four digits on the Number Pad—the number row over the letters won't work.
You must type all 4 digits, including the leading 0 (zero). Note that the capital version of a letter is 32 less than the lower case.
These instructions are for the US-English keyboard layout. It may not work for other keyboard layouts.
(Print-friendly version of this table)
Windows Shortcuts for Basic Latin symbols | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
` grave | ´ acute (aigu) | ^ circumflex | ~ tilde | ¨ umlaut | ° ring | Ligatures | |||||||
à | Alt-0224 | á | Alt-0225 | â | Alt-0226 | ã | Alt-0227 | ä | Alt-0228 | å | Alt-0229 | æ | Alt-0230 |
À | Alt-0192 | Á | Alt-0193 | Â | Alt-0194 | Ã | Alt-0195 | Ä | Alt-0196 | Å | Alt-0197 | Æ | Alt-0198 |
è | Alt-0232 | é | Alt-0233 | ê | Alt-0234 | ë | Alt-0235 | œ | Alt-0156 | ||||
È | Alt-0200 | É | Alt-0201 | Ê | Alt-0202 | Ë | Alt-0203 | Œ | Alt-0140 | ||||
ì | Alt-0236 | í | Alt-0237 | î | Alt-0238 | ï | Alt-0239 | ||||||
Ì | Alt-0204 | Í | Alt-0205 | Î | Alt-0206 | Ï | Alt-0207 | / slash | |||||
ò | Alt-0242 | ó | Alt-0243 | ô | Alt-0244 | õ | Alt-0245 | ö | Alt-0246 | ø | Alt-0248 | ||
Ò | Alt-0210 | Ó | Alt-0211 | Ô | Alt-0212 | Õ | Alt-0213 | Ö | Alt-0214 | Ø | Alt-0216 | ||
ù | Alt-0249 | ú | Alt-0250 | û | Alt-0251 | ü | Alt-0252 | ||||||
Ù | Alt-0217 | Ú | Alt-0218 | Û | Alt-0219 | Ü | Alt-0220 | currency | mathematics | ||||
ý | Alt-0253 | ñ | Alt-0241 | ÿ | Alt-0255 | ¢ | Alt-0162 | ± | Alt-0177 | ||||
Ý | Alt-0221 | Ñ | Alt-0209 | £ | Alt-0163 | × | Alt-0215 | ||||||
çedilla | Icelandic | marks | accents | punctuation | ¥ | Alt-0165 | ÷ | Alt-0247 | |||||
ç | Alt-0231 | Þ | Alt-0222 | © | Alt-0169 | ´ | Alt-0180 | ¿ | Alt-0191 | ¤ | Alt-0164 | ¬ | Alt-0172 |
Ç | Alt-0199 | þ | Alt-0254 | ® | Alt-0174 | ¨ | Alt-0168 | ¡ | Alt-0161 | ° | Alt-0176 | ||
superscripts | Ð | Alt-0208 | ¶ | Alt-0182 | ¯ | Alt-0175 | « | Alt-0171 | µ | Alt-0181 | |||
¹ | Alt-0185 * | ð | Alt-0240 | § | Alt-0167 | ¸ | Alt-0184 | » | Alt-0187 | ordinals | ¼ | Alt-0188 † | |
² | Alt-0178 * | sz ligature | ¦ | Alt-0166 | · | Alt-0183 | º | Alt-0186 * | ½ | Alt-0189 † | |||
³ | Alt-0179 * | ß | Alt-0223 | ª | Alt-0170 * | ¾ | Alt-0190 † |
* Unless specifically requested by the Project Comments, please do not use the ordinal or superscript symbols, but instead use the guidelines for Superscripts. (x^2, f^o, etc.)
† Unless specifically requested by the Project Comments, please do not use the fraction symbols, but instead use the guidelines for Fractions. (1/2, 1/4, 3/4, etc.)
For Apple Macintosh:
The following instructions and chart are for the US-English keyboard layout. The ABC - Extended keyboard on more recent versions of Mac OS have some additional combinations, but some of the basic combinations listed in the chart are different.
- You can use the "Key Caps" program as a reference.
In OS 9 & earlier, this is located in the Apple Menu; in OS X through 10.2, it is located in Applications, Utilities folder.
This brings up a picture of the keyboard, and pressing shift, opt, command, or combinations of those keys shows how to produce each character. Use this reference to see how to type that character, or you can cut & paste it from here into the text in the proofreading interface. - In OS X 10.3 and higher, the same function is now a palette available from the Input menu (the drop-down menu attached to your locale's flag icon in the menu bar). It's labeled "Show Keyboard Viewer." If this isn't in your Input menu, or if you don't have that menu, you can activate it by opening System Preferences, the "International" panel, and selecting the "Input Menu" pane. Ensure that "Show input menu in menu bar" is checked. In the spreadsheet view, check the box for "Keyboard Viewer" in addition to any input locales you use.
- The character picker in the proofreading interface.
- Or you can type the Apple Opt- shortcut codes list below for these characters.
This is a lot faster than using cut & paste, once you get used to the codes.
Hold the Opt key and type the accent symbol, then type the letter to be accented (or, for some codes, only hold the Opt key and type the symbol).
(Print-friendly version of this table)
Apple Mac Shortcuts for Basic Latin symbols | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
` grave | ´ acute (aigu) | ^ circumflex | ~ tilde | ¨ umlaut | ° ring | Ligatures | |||||||
à | Opt-`, a | á | Opt-e, a | â | Opt-i, a | ã | Opt-n, a | ä | Opt-u, a | å | Opt-a | æ | Opt-' |
À | Opt-`, A | Á | Opt-e, A | Â | Opt-i, A | Ã | Opt-n, A | Ä | Opt-u, A | Å | Opt-A | Æ | Opt-" |
è | Opt-`, e | é | Opt-e, e | ê | Opt-i, e | ë | Opt-u, e | œ | Opt-Q | ||||
È | Opt-`, E | É | Opt-e, E | Ê | Opt-i, E | Ë | Opt-u, E | Œ | Shift-Opt-Q | ||||
ì | Opt-`, i | í | Opt-e, i | î | Opt-i, i | ï | Opt-u, i | ||||||
Ì | Opt-`, I | Í | Opt-e, I | Î | Opt-i, I | Ï | Opt-u, I | / slash | |||||
ò | Opt-`, o | ó | Opt-e, o | ô | Opt-i, o | õ | Opt-n, o | ö | Opt-u, o | ø | Opt-o | ||
Ò | Opt-`, O | Ó | Opt-e, O | Ô | Opt-i, O | Õ | Opt-n, O | Ö | Opt-u, O | Ø | Opt-O | ||
ù | Opt-`, u | ú | Opt-e, u | û | Opt-i, u | ü | Opt-u, u | ||||||
Ù | Opt-`, U | Ú | Opt-e, U | Û | Opt-i, U | Ü | Opt-u, U | currency | mathematics | ||||
ý | Opt-e, y | ñ | Opt-n, n | ÿ | Opt-u, y | ¢ | Opt-4 | ± | Shift-Opt-= | ||||
Ý | Opt-e, Y | Ñ | Opt-n, N | £ | Opt-3 | × | (none) ‡ | ||||||
çedilla | Icelandic | marks | accents | punctuation | ¥ | Opt-y | ÷ | Opt-/ | |||||
ç | Opt-c | Þ | (none) ‡ | © | Opt-g | ´ | Opt-E | ¿ | Opt-? | ¤ | (none) ‡ | ¬ | Opt-l |
Ç | Opt-C | þ | (none) ‡ | ® | Opt-r | ¨ | Opt-U | ¡ | Opt-1 | ° | Shift-Opt-8 | ||
superscripts | Ð | (none) ‡ | ¶ | Opt-7 | ¯ | Shift-Opt-, | « | Opt-\ | µ | Opt-m | |||
¹ | (none) *‡ | ð | (none) ‡ | § | Opt-6 | ¸ | Opt-Z | » | Shift-Opt-\ | ordinals | ¼ | (none) †‡ | |
² | (none) *‡ | sz ligature | ¦ | (none) ‡ | · | Shift-Opt-9 | º | Opt-0 * | ½ | (none) †‡ | |||
³ | (none) *‡ | ß | Opt-s | ª | Opt-9 * | ¾ | (none) †‡ |
* Unless specifically requested by the Project Comments, please do not use the ordinal or superscript symbols, but instead use the guidelines for Superscripts. (x^2, f^o, etc.)
† Unless specifically requested by the Project Comments, please do not use the fraction symbols, but instead use the guidelines for Fractions. (1/2, 1/4, 3/4, etc.)
‡ Note: No equivalent shortcut; use character picker if needed.
Alphabetical Index to the Guidelines
Return to: Distributed Proofreaders home page, DP FAQ Central page, Project Gutenberg home page. |