Common Fraktur OCR errors
Some of the most common OCR text errors are listed below. You will find yourself getting better proofing results if you keep these in mind while you work. Double-checking for these common errors can make a BIG difference!
The Fraktur alphabet
Some characters have a very similar appearance:
- f | ſ (long s) (yes, there are two different characters for 's'!)
- a | u | n
- c | e | o
- t | k
- i | l
- d | ck (Ligature)
- w | sch (too many vertical lines?)
- M | W
- A | U
- R | K
- E | G
- h | y
See also this site for a discussion of similar letters in Fraktur.
'long s' vs. 'f'
The 'normal' s is used at the end of a syllable, the long 's' (ſ) elsewhere. If in doubt, this may help you to proof the correct word: aus vs. auf, ausſteigen vs. aufſteigen etc.
For long s in general: Proofing_old_texts#Long_s|"Proofing Old Texts" page
very seldom correctly recognized characters
Sometimes numbers are unreadable in a fraktur-font. The numbers are recognized as dirt and do not appear.
strange (=older) writings
Especially in German texts: the OCR software often uses a spellchecker with a modern dictionary, so words that have changed in spelling may be OCRed with the modern spellings instead of what is on the page:
- words earlier written with 'th', now are written only with 't' (e.g. eigenthümlich -> eigentümlich, roth -> rot). The OCR results miss the 'h's.
- The same with umlaut-dots: kömmt -> kommt
Be sure to proof it according to the image, not modern spelling.
- People and Place-names are often spelled in unexpected ways.
- Double-check all proper names to be certain that what got OCR'd as "Banks" isn't really "Bariks" ("ri" / "n" is a really common OCR error).
Fixing common OCR errors in Preprocessing when Providing Content
Even with hand-trained patterns, OCR programs have their problems with fraktur font. In order to fix the most common errors, frakprep should be used.