Frakprep

From DPWiki
Jump to navigation Jump to search

frakprep

frakprep is a tool written originally by richyfourtytwo, used for preprocessing (mainly) German Fraktur texts, and fixing the most common OCR errors.

Usage

frakprep--like thundergnat's guiprep--is designed to improve the quality of texts that were produced by an OCR. More specifically, frakprep tries to handle the problems of german fraktur texts. frakprep is no replacement for guiprep; rather the order should be to run the OCR first, then guiprep and finally frakprep.

Running frakprep

The important files for running frakprep are

  • frakprep.exe (the program itself)
  • dic.txt (a german dictionary, derived from the aspell dictionary)

frakprep.exe and dic.txt should be in the same directory. Make two subdirectories in that directory named 'in' and 'out'. Put all *.txt files to be processed in 'in'. Open a command line interpreter ('dos-box') and change into the directory where frakprep is. Type 'frakprep <parameters>' replacing <parameters> with the appropriate parameters (see below). All files in 'in' will be processed and saved into 'out' under the same name. Beware: frakprep will overwrite files in 'out' without warning.

Parameters

german - for special german heuristics, doesn't make sense for other languages. 
finals - special treatment of final s, activate this for german fraktur texts 
I      - special treatment of I<->J 
=      - join words around linebreaks seperated with '=' 
Bxy    - Bidirectional letter ambiguity. 
         Replace x and y with the appropriate letters (case sensitive). 
         Any number of these parameters may be used. 
         Each letter should only appear once in such (Bxy or Uxy) a 
         parameter or results will be somewhat unpredictable. 
Uxy    - Unidirectional letter ambiguity. 
         (y in original often read as x in OCR.) 
         Replace x and y with the appropriate letters (case sensitive). 
         Any number of these parameters may be used. 
         Each letter should only appear once in such (Bxy or Uxy) a 
         parameter or results will be somewhat unpredictable. 

The standard call for a german fraktur text is:

frakprep german finals I = Ufs Bnu


Download

Download frakprep