Common errors proofers find

From DPWiki
Jump to: navigation, search

General

Some of the most common OCR text errors are listed below. You will find yourself getting better proofing results if you keep these in mind while you work. Double-checking for these common errors can make a big difference in the speed your project advances!

Numbers

l / 1 / !,
2 / Z,
5 / S,
6 / G,
0 / O

The lower-case letter "l", the digit "1" and the exclamation point ("!") are often confused by the software. Some fonts, especially if you've chosen a small font-size for your text window, make them appear almost identical as well. Setting a larger font size, or using a specialized font such as DP Sans Mono will help make them appear different from each other. The same goes for zero (0) versus the capital letter O. In most fonts, the number will be thinner (closer to an ellipse than a circle), while the letter will appear to be almost a perfect circle.

Inspect dates closely. "l8S4" should be corrected to "1854".

Proper Names

People and place-names are often spelled in unexpected ways.

Double-check all proper names to be certain that what got OCR'd as "Banks" isn't really "Bariks" ("ri" / "n" is a really common OCR error).

Spacing Around Quotation Marks

Since print books often have variable spacing between words (to support right- and left-justification of the text), the OCR will often get confused and put spaces in the wrong spot for quotation marks.

as they walked he continued" and then what happened?"

should be corrected to:

as they walked he continued "and then what happened?"

Unexpected Capital Letters

You are reading the text and notice,

after all of that What will I do next?"

Hmm... why that big capital "W" in the middle of a sentence? There are 3 possibilities:

  • It's supposed to be the start of a new sentence.
  • It's supposed to be a small letter.
  • It really is supposed to be a Capital letter -- the Author decided to capitalize this word.

Look closely at the scan, zooming larger if you need to. Is there a capital "W" in the original? If so, is there also a period before it? There may be a period that was too faint for the OCR to recognize, or it may have gotten smudged into the preceding letter. Type that missing period into the text. If you think that there should be a period, but cannot see one in the scan, leave a note such as [** Missing period?] or even just [** .].

If it clearly isn't a capital "W" but a lower-case "w", then change it to lower-case in the text. (The OCR software is sometimes confused by letters that are the same shape in capitals and lower-case, like W, S, C, etc.)

If it is a capital "W", but there's no period in front of it, then apparently the Author just decided to capitalize this word. That sometimes happens in books, Authors can be individualistic. Leave it the way the Author wrote it.

Fraktur

Fraktur is often very difficult for OCR software to read, especially if it's "looking for English". There is a special wiki article that covers these issues.

See Common Fraktur OCR errors