Moderate Proofreading Tutorial, Page 4

Common OCR Problems

OCR commonly has trouble distinguishing between the similar characters. Some examples are:

  • The digit '1' (one), the lowercase letter 'l' (ell), and the uppercase letter 'I'. Note that in some fonts the number one may look like I (like a small capital letter 'i').
  • The digit '0' (zero), and the uppercase letter 'O'.
  • Dashes & hyphens: Proofread these carefully—OCR'd text often has only one hyphen for an em-dash that should have two.
  • Parentheses ( ) and curly braces { }.

Watch out for these. Normally the context of the sentence is sufficient to determine which is the correct character, but be careful—often your mind will automatically 'correct' these as you are reading.

Distributed Proofreaders was founded in 2000 by Charles Franks to support the digitization of Public Domain books. Originally conceived to assist Project Gutenberg (PG), Distributed Proofreaders (DP) is now the main source of PG e-books. In 2002, Distributed Proofreaders became an official PG site. In May 2006, Distributed Proofreaders became a separate legal entity and continues to maintain a strong relationship with PG.