PPTools/PPspell
ppspell: A Spell Checking tool for PPers
Overview
Note: An online version of ppspell is now available as part of the Distributed Proofreaders Post-Processing Workbench PPtext tool, and should be run from there.
The ppspell tool has been written for post-processors. It can make use both of a standard English dictionary that ppspell provides (different from the DP dictionary) as well as the "Good Words List" for a specific project if you trust that list. At DP each project has a Good Words List (GWL) which should contain words suggested by the proofreaders that appear in the book but which are not in the standard DP dictionary. Assuming the Project Manager has not included any misspelled words in the GWL then it can be a valuable aid to your spell checking efforts during post-processing. Proper names are one example. With ppspell, you can provide those words to the program. Ppspell can also provide a list of words it believes are probably “good words” based on the program’s analysis of the text. These can be used for this book or for other books in the same series.
Ppspell also provides an optional capability: it can run Levenshtein distance checks of words within the text. Words that occur infrequently and that are a short “edit distance” from another infrequently occuring word in the text are often suspect. This check can take a long time, but it can find truly elusive spelling errors. The spell check function in GuiGuts (GG) has a similar capability, but the GG check (part of the Word Frequency tools) works only on a specific word suspected by the PPer. The check in ppspell works across all words in the book, so it can be valuable to run at least once even if you use GG for your basic spell-checking.
Running ppspell
Note: An online version of ppspell is now available as part of the Distributed Proofreaders Post-Processing Workbench, and should be run from there.
ppspell runs on the command line in Windows or on a Mac, and the current version requires Python 3. The ppspell program takes an input file (“-i filename”) and generates an output report file (“-o logfile”) containing the results of its analysis.
Here are the options to ppspell:
python3 ppspell.py -i book.txt source file, Latin-1 or UTF-8 -o slog.txt output file for log -s sugfile.txt output file for words suggested for supplemental list -g good_words.txt good words, one per line -l include Levenshtein distance checks -d debug output -q quiet, minimal messages
Example commandsrun one of these commands:
Using defaults: python3 ppspell.py -i filename.txt Using available optional arguments: python3 ppspell.py -g goodwords.txt -s suggest.txt -i filename.txt -o slog.txt Using available optional arguments, and requesting the recommended Levenshtein checks: python3 ppspell.py -g goodwords.txt -s suggest.txt -i filename.txt -o slog.txt -l
Then examine the output file: slog.txt
Note: The current release of ppspell is 2.00. However, there are two versions of 2.00 floating around. They are identical, except that one has a default output file name of plog.txt while the other (see link below) has a default output file name of slog.txt
Obtaining ppspell
Note: An online version of ppspell is now available as part of the Distributed Proofreaders Post-Processing Workbench, and should be run from there.
Program History
Roger Frank created ppspell, and is still the primary maintainer of the program. Walt Farrell (wfarrell) maintains this page and the downloadable copy of ppspell for DP.
- 2016-02-05: Minor update to provide usage information if the user doesn't provide an input file name. No change to other functionality.
- 2016-02-06: ppspell 2.00a-wf: Place output file in the same directory as the source (input) file by default.
- 2018: ppspell made available via the Workbench.