Guiprep Installation and Quick Start Guide
Guiprep Installation and Upgrade
Initial installation
Guiprep requires Perl, and it was edited and tested under Strawberry Perl on Windows. If you have a different version of Perl, guiprep may work. There are reports from the wild of it working on Linux under a different Perl. If you do not have Perl, install Strawberry Perl. If you don't know if you have Perl, open a command prompt on your computer and type:
perl --version
If the reply is the version of Perl that is installed on your computer, then you don't need to install perl.
Download the most recent version of guiprep from Guiprep Releases page at Github and expand it in a directory.
After installing Perl successfully, you also need some additional modules, Windows users can double click on a file in the distribution package, install_cpan_modules.pl. For other operating systems, see the INSTALL.md file in the distribution package.
Updating an existing installation
Updating an existing installation is not recommended. Using a copy of settings.rc from an earlier version may disable some of the newer features. Instead, rename your old guiprep folder, and then proceed as if it were a new installation.
Guiprep Initial Use
The most common use of Guiprep is to process the output from OCR, dehyphenating end-of-line words and preparing it for upload to Distributed Proofreaders web-site for proofreading. The output of the OCR may be two sets of text files, one set is required with a file per page and including line breaks as in the images (in directory textw) and optionally a second set of files without the line breaks using the OCR's dictionary to resolve end of line hyphenation (in directory textwo).
Directory Setup
Guiprep expects to find textw and optionally the textwo in a project directory. The output of dehyphenation will be placed in the text directory, also in the project directory, which will be created if it is not present. If you are going to use guiprep to rename or optimize your png files, then there should also be a pngs directory as a sub-directory of the project directory, containing all the png files.
Starting Guiprep
If your computer runs Windows, there is a file in the distribution called run_guiprep.bat. Double-clicking on this file will start guiprep. (Older distributions of guiprep contained winprep.exe or run_guiprep???.bat [where ??? is the version number]. If you have any of these files on your computer, you should consider following the instructions for upgrading shown above.)
In all cases, you can start guiprep from a command prompt. Guiprep will only work properly if started in the guiprep directory, the one that was unzipped during installation.
cd <guiprep directory> perl guiprep.pl
For instance, on my computer I start the most recent version of guiprep with
cd \pgdp\guiprep perl guiprep.pl
(The change directory command may have a different syntax on your computer.)
Select Options
When guiprep starts, it will open to the Select Options tab. Once you get the settings you want, you will need to look at this tab very infrequently.
Changing the "Default Markup" is not recommended.
In the first set of options:
- Make sure that Dehyphenate using German style hypens... is not checked, unless your project uses them.
- Save hyphens.txt & dehyphen.txt... is primarily for debugging and should be unchecked unless requested by a support person.
- Make sure that you do not attempt to remove headers and footers in both the OCR program and guiprep. If they were removed in OCR, then uncheck those options here. If headers and footers are still in the files after the OCR, then check the boxes. If headers and footers are not present and you tell guiprep to remove headers and footers, it will remove a line or two of text from the top and bottom of each page.
- Build a standard upload batch... If you are not going to make any further changes to the text before uploading, this might be helpful. Most CPs at least take a look at the guiprep output and may want to make changes.
In the scrollable list of options below that, the following are primarily of historical interest, and generally should be unchecked:
- Convert £ to "Pounds".
- Convert ¢ to "Cents".
- Convert § to "Section".
- Convert ° to "Degrees".
The following option will put curious marks in your text if the OCR ran words together, and you should consider whether to use it or not:
- Mark possible missing spaces between word/sentences.
If you are working on a book that contains mathematics, then you may want to uncheck:
- Convert solitary 1 to l.
- Convert solitary 0 to O.
It is good to familiarize yourself with all of the options because some may be relevant for a specific project.
Change Directory
If you have multiple disk volumes on your computer, select the drive containing the project directory you want to process.
Use the windows to navigate to your prep files. Interactive mode is the easiest to use and the only mode that works for Search and Headers & Footers. To use interactive mode, navigate the left-hand window (Change To Directory) so that your text directory(s) and optionally your pngs directory appear in that window. Ignore the other directory listing (Select Directories To Batch Process).
Process Text
The options in the Process Text tab:
- Extract Markup -- A good idea if coming from rtf files or of the text was previously processed through DP formatting, otherwise you can leave this checked and it won't do anything.
- Dehyphenate -- That's why we are using this program.
- Rename Txt Files -- OCR programs frequently put funky names on text files. This changes them to 001.txt, 002.txt, ...
- Filter Files -- Fix some common character substitutions. A good idea.
- Fix Common Scannos -- Another good idea.
- Fix Olde Engliſh -- This looks for things that might be the long s which was used in old English (ſ) and converts them to s. Don't use this option unless you know your project contains long s, because it will try to change f to s. If your book does contain long s, then this option is desirable.
- Convert to ISO 8859-1. -- Don't use this for books which will be represented in utf-8. Since that is most of our books today, uncheck this option.
- Rename Png Files -- If png files are present, they are renamed to match the Txt File renaming mentioned above, i.e. 001.png, 002.png, ...
- Run Pngcrush -- Pngcrush optimizes the png files for size without losing any information. There are other programs which will also optimize png files. It is important that png files get optimized before uploading to the Distributed Proofreaders web-site. If you don't do it here, then make sure you do it elsewhere before uploading.
At this point the status window in the lower left hand corner should say
Working in interactive mode.
Hit the Start Processing button and watch it run. If you are working on a large book, or you are running pngcrush, this can take some time. (If nothing happens, then you probably did not select the proper directory in Change Directory.) When it finishes, the last text in the large text window on the right will be
Finished all selected routines.
Your text files are now ready, and if you requested any work be done to your png files, that has been done as well.
Explore other options and tabs
This is a quick start guide, not a complete manual, and there are other options and ways of using guiprep. This document only attempts to show the most straightforward way of using guiprep for a beginner. It is safe to explore the other features and tabs, and you are encouraged to do so. The full user guide is in the distribution package and is linked to below.
See also
- the Guiprep wiki page.
- GuiPrep scanno file for French texts.
- the Guiprep user manual