User:Solol/Fr Sandbox/Common Proofreading Problems

From DPWiki
Attention yellow.png Warning

This is a draft of the revised proofreading guidelines. When proofreading at PGDP you should use the current proofreading guidelines located here.


Proofreading Guidelines
Proofreading Summary
Proofreading on the Character Level
Proofreading on the Paragraph Level
Proofreading on the Page Level
Miscellany
Common Problems
Index
Version TBAdded.



Preexisting Formatting

Commentaires, suggestions :

You may sometimes find formatting already present in the text. Do not add or change this formatting information; the formatters will do that later in the process. However, you can remove it if it interferes with your proofreading. The <x> button in the proofreading interface will remove markup such as <i> and <b> from highlighted text. Some examples of formatting tasks include:

  • <i>italics</i>, <b>bold</b>, <sc>Small Caps</sc>
  • Spaced-out text
  • Font size changes
  • Spacing of chapter and section headings
  • Extra spaces, stars, or lines between paragraphs
  • Footnotes that continue for more than one page
  • Footnotes marked with symbols
  • Illustrations
  • Sidenote locations
  • Arrangement of data in tables
  • Indentation (in poetry or elsewhere)
  • Rejoining long lines in poetry and indexes

If the previous proofreader inserted formatting, please take a moment and provide feedback to them by clicking on their name in the proofreading interface and posting a private message to them explaining how to handle the situation in the future. Remember to leave the formatting to the Formatting rounds.


Common OCR Problems

Commentaires, suggestions :

OCR commonly has trouble distinguishing between the similar characters. Some examples are:

  • The digit '1' (one), the lowercase letter 'l' (ell), and the uppercase letter 'I'. Note that in some fonts the number one may look like I (like a small capital letter 'i').
  • The digit '0' (zero), and the uppercase letter 'O'.
  • Dashes & hyphens: Proofread these carefully—OCR'd text often has only one hyphen for an em-dash that should have two. See the guidelines for hyphenated words and em-dashes for more detailed information.
  • Parentheses ( ) and curly braces { }.

Watch out for these. Normally the context of the sentence is sufficient to determine which is the correct character, but be careful—often your mind will automatically 'correct' these as you are reading.

Noticing these is much easier if you use a mono-spaced font such as DPCustomMono or Courier.


OCR Problems: Scannos

Commentaires, suggestions :

Another common OCR issue is misrecognition of characters. We call these errors "scannos" (like "typos"). This misrecognition can create a word that:

  • appears to be correct at first glance, but is actually misspelled.
    This can usually be caught by running WordCheck from the proofreading interface.
  • is changed to a different but otherwise valid word that does not match what is in the page image.
    This is subtle because it can only be caught by someone actually reading the text.

Possibly the most common example of the second type is "and" being OCR'd as "arid." Other examples: "eve" for "eye", "Torn" for "Tom", "train" for "tram". This type is harder to spot and we have a special term for them: "Stealth Scannos." We collect examples of Stealth Scannos in this thread.

Spotting scannos is much easier if you use a mono-spaced font such as DPCustomMono or Courier.


OCR Problems: Is that ° º really a degree sign?

Commentaires, suggestions :

There are three different symbols that can look very similar in the image and that the OCR software interprets the same (and usually incorrectly):

  • The degree sign °: This should be used only to indicate degrees (of temperature, of angle, etc.).
  • The superscript o: Virtually all other occurrences of a raised o should be proofread as ^o, following the guidelines for Superscripts.
  • The masculine ordinal º: Proofread this like a superscript too unless the special character is requested in the Project Comments. It may be used in languages such as Spanish and Portuguese, and is the equivalent of the -th in English 4th, 5th, etc. It follows numbers and has the feminine equivalent in the superscript a (ª).

Handwritten Notes in Book

Commentaires, suggestions :

Do not include handwritten notes in a book (unless it is overwriting faded, printed text to make it more visible). Do not include handwritten marginal notes made by readers, etc.


Bad Image

Commentaires, suggestions :

If an image is bad (not loading, mostly illegible, etc.), please post about this bad image in the project discussion and click on the "Report Bad Page" button so this page is 'quarantined', rather than returning the page to the round. If only a small portion of the image is bad, leave a note as described above, and please post in the project discussion without marking the whole page bad. The "Bad Page" button is only available during the first round of proofreading, so it is important that these issues be resolved early.

Note that some page images are quite large, and it is common for your browser to have difficulty displaying them, especially if you have several windows open or are using an older computer. Before reporting this as a bad page, try closing some of your windows and programs to see if that helps, or post in the project discussion to see if anyone else has the same problem.


Wrong Image for Text

Commentaires, suggestions :

If there is a wrong image for the text given, please post about this bad page in the project discussion and click on the "Report Bad Page" button so this page is 'quarantined', rather than returning the page to the round. The "Bad Page" button is only available during the first round of proofreading, so it is important that these issues be resolved early.

It's fairly common for the OCR'd text to be mostly correct, but missing the first line or two of the text. Please just type in the missing line(s). If nearly all of the lines are missing in the text box, then either type in the whole page (if you are willing to do that), or just click on the "Return Page to Round" button and the page will be reissued to someone else. If there are several pages like this, you might post a note in the project discussion to notify the Project Manager.


Previous Proofreader Mistakes

Commentaires, suggestions :

If the previous proofreader made a lot of mistakes or missed a lot of things, please take a moment to provide feedback to them by clicking on their name in the proofreading interface and posting a private message to them explaining how to handle the situation so that they will know how in the future.

Please be nice! Everyone here is a volunteer and presumably trying their best. The point of your feedback message should be to inform them of the correct way to proofread, rather than to criticize them. Give a specific example from their work showing what they did, and what they should have done.

If the previous proofreader did an outstanding job, you can also send them a message about that—especially if they were working on a particularly difficult page.


Printer Errors/Misspellings

Commentaires, suggestions :

Correct all of the words that the OCR has misread (scannos), but do not correct what may appear to you to be misspellings or printer errors that occur on the page image. Many of the older texts have words spelled differently from modern usage and we retain these older spellings, including any accented characters.

Place a note in the txet [**typo for text?] next to a printer's error. If you are unsure whether it is actually an error, please also ask in the project discussion. If you do make a change, include a note describing what you changed: [**typo "txet" fixed]. Include the two asterisks ** so the post-processor will notice it.


Factual Errors in Texts

Commentaires, suggestions :

Do not correct factual errors in the author's book. Many of the books we are proofreading have statements of fact in them that we no longer accept as accurate. Leave them as the author wrote them. See Printer Errors/Misspellings for how to leave a note if you think the printed text is not what the author intended.


Inserting Special Characters

Commentaires, suggestions :

If they are not on your keyboard, there are several ways to input Latin-1 characters:

  • The pull-down menus in the proofreading interface.
  • Applets included with your operating system. If you use one of these, be sure to insert only Latin-1 characters (those listed in the charts below).
    • Windows: "Character Map"
      Access it through:
      Start: Run: charmap, or
      Start: Accessories: System Tools: Character Map.
    • Macintosh: Key Caps or "Keyboard Viewer"
      For OS 9 and lower this is on the Apple Menu,
      For OS X through 10.2, this is located the in Applications, Utilities folder
      For OS X 10.3 and higher, this is in the Input Menu as "Keyboard Viewer."
    • Linux: The name and location of the character picker will vary depending on your desktop environment.
  • An on-line program, such as Edicode.
  • Keyboard shortcuts.
    (See the tables for Windows and Macintosh below.)
  • Switching to a keyboard layout or locale which supports "deadkey" accents.
    • Windows: Control Panel (Keyboard, Input Locales)
    • Macintosh: Input Menu (on Menu Bar)
    • Linux: Change the keyboard in your X configuration.

For Windows:

  • You can use the Character Map program (Start: Run: charmap) to select an individual letter, and then cut & paste.
  • The dropdown menus in the proofreading interface.
  • Or you can type the Alt NumberPad shortcut codes listed below for these characters. This is faster than using cut & paste, once you get used to the codes.
    Hold the Alt key and type the four digits on the Number Pad—the number row over the letters won't work.
    You must type all 4 digits, including the leading 0 (zero). Note that the capital version of a letter is 32 less than the lower case.
    These instructions are for the US-English keyboard layout. It may not work for other keyboard layouts.
    (Print-friendly version of this table)


Windows Shortcuts for Latin-1 symbols
` grave ´ acute (aigu) ^ circumflex ~ tilde ¨ umlaut ° ring Æ ligature
à Alt-0224 á Alt-0225 â Alt-0226 ã Alt-0227 ä Alt-0228 å Alt-0229 æ Alt-0230
À Alt-0192 Á Alt-0193 Â Alt-0194 Ã Alt-0195 Ä Alt-0196 Å Alt-0197 Æ Alt-0198
è Alt-0232 é Alt-0233 ê Alt-0234 ë Alt-0235
È Alt-0200 É Alt-0201 Ê Alt-0202 Ë Alt-0203
ì Alt-0236 í Alt-0237 î Alt-0238 ï Alt-0239
Ì Alt-0204 Í Alt-0205 Î Alt-0206 Ï Alt-0207 / slash
ò Alt-0242 ó Alt-0243 ô Alt-0244 õ Alt-0245 ö Alt-0246 ø Alt-0248
Ò Alt-0210 Ó Alt-0211 Ô Alt-0212 Õ Alt-0213 Ö Alt-0214 Ø Alt-0216
ù Alt-0249 ú Alt-0250 û Alt-0251 ü Alt-0252
Ù Alt-0217 Ú Alt-0218 Û Alt-0219 Ü Alt-0220 currency mathematics
ý Alt-0253 ñ Alt-0241 ÿ Alt-0255 ¢ Alt-0162 ± Alt-0177
Ý Alt-0221 Ñ Alt-0209 £ Alt-0163 × Alt-0215
çedilla Icelandic marks accents punctuation ¥ Alt-0165 ÷ Alt-0247
ç Alt-0231 Þ Alt-0222 © Alt-0169 ´ Alt-0180 ¿ Alt-0191 ¤ Alt-0164 ¬ Alt-0172
Ç Alt-0199 þ Alt-0254 ® Alt-0174 ¨ Alt-0168 ¡ Alt-0161 ° Alt-0176
superscripts Ð Alt-0208 Alt-0182 ¯ Alt-0175 « Alt-0171 µ Alt-0181
¹ Alt-0185 * ð Alt-0240 § Alt-0167 ¸ Alt-0184 » Alt-0187 ordinals ¼ Alt-0188 †
² Alt-0178 * sz ligature ¦ Alt-0166 · Alt-0183 º Alt-0186 * ½ Alt-0189 †
³ Alt-0179 * ß Alt-0223 ª Alt-0170 * ¾ Alt-0190 †

Commentaires, suggestions :

* Unless specifically requested by the Project Comments, please do not use the ordinal or superscript symbols, but instead use the guidelines for Superscripts.

† Unless specifically requested by the Project Comments, please do not use the fraction symbols, but instead use the guidelines for Fractions. (1/2, 1/4, 3/4, etc.)

For Apple Macintosh

For Apple Macintosh:

Commentaires, suggestions :

  • You can use the "Key Caps" program as a reference.
    In OS 9 & earlier, this is located in the Apple Menu; in OS X through 10.2, it is located in Applications, Utilities folder.
    This brings up a picture of the keyboard, and pressing shift, opt, command, or combinations of those keys shows how to produce each character. Use this reference to see how to type that character, or you can cut & paste it from here into the text in the proofreading interface.
  • In OS X 10.3 and higher, the same function is now a palette available from the Input menu (the drop-down menu attached to your locale's flag icon in the menu bar). It's labeled "Show Keyboard Viewer." If this isn't in your Input menu, or if you don't have that menu, you can activate it by opening System Preferences, the "International" panel, and selecting the "Input Menu" pane. Ensure that "Show input menu in menu bar" is checked. In the spreadsheet view, check the box for "Keyboard Viewer" in addition to any input locales you use.
  • The dropdown menus in the proofreading interface.
  • Or you can type the Apple Opt- shortcut codes list below for these characters.
    This is a lot faster than using cut & paste, once you get used to the codes.
    Hold the Opt key and type the accent symbol, then type the letter to be accented (or, for some codes, only hold the Opt key and type the symbol).
    These instructions are for the US-English keyboard layout. It may not work for other keyboard layouts.
    (Print-friendly version of this table)


Apple Mac Shortcuts for Latin-1 symbols
` grave ´ acute (aigu) ^ circumflex ~ tilde ¨ umlaut ° ring Æ ligature
à Opt-`, a á Opt-e, a â Opt-i, a ã Opt-n, a ä Opt-u, a å Opt-a æ Opt-'
À Opt-`, A Á Opt-e, A Â Opt-i, A Ã Opt-n, A Ä Opt-u, A Å Opt-A Æ Opt-"
è Opt-`, e é Opt-e, e ê Opt-i, e ë Opt-u, e
È Opt-`, E É Opt-e, E Ê Opt-i, E Ë Opt-u, E
ì Opt-`, i í Opt-e, i î Opt-i, i ï Opt-u, i
Ì Opt-`, I Í Opt-e, I Î Opt-i, I Ï Opt-u, I / slash
ò Opt-`, o ó Opt-e, o ô Opt-i, o õ Opt-n, o ö Opt-u, o ø Opt-o
Ò Opt-`, O Ó Opt-e, O Ô Opt-i, O Õ Opt-n, O Ö Opt-u, O Ø Opt-O
ù Opt-`, u ú Opt-e, u û Opt-i, u ü Opt-u, u
Ù Opt-`, U Ú Opt-e, U Û Opt-i, U Ü Opt-u, U currency mathematics
ý Opt-e, y ñ Opt-n, n ÿ Opt-u, y ¢ Opt-4 ± Shift-Opt-=
Ý Opt-e, Y Ñ Opt-n, N £ Opt-3 × (none) ‡
çedilla Icelandic marks accents punctuation ¥ Opt-y ÷ Opt-/
ç Opt-c Þ (none) ‡ © Opt-g ´ Opt-E ¿ Opt-? ¤ (none) ‡ ¬ Opt-l
Ç Opt-C þ (none) ‡ ® Opt-r ¨ Opt-U ¡ Opt-1 ° Shift-Opt-8
superscripts Ð (none) ‡ Opt-7 ¯ Shift-Opt-, «  Opt-\ µ Opt-m
¹ (none) *‡ ð (none) ‡ § Opt-6 ¸ Opt-Z » Shift-Opt-\ ordinals ¼ (none) †‡
² (none) *‡ sz ligature ¦ (none) ‡ · Shift-Opt-9 º Opt-0 * ½ (none) †‡
³ (none) *‡ ß Opt-s ª Opt-9 * ¾ (none) †‡

Commentaires, suggestions :

* Unless specifically requested by the Project Comments, please do not use the ordinal or superscript symbols, but instead use the guidelines for Superscripts.

† Unless specifically requested by the Project Comments, please do not use the fraction symbols, but instead use the guidelines for Fractions. (1/2, 1/4, 3/4, etc.)

‡ Note: No equivalent shortcut; use drop-down menus if needed.