GutAxe
GutAxe is an interactive software tool that allows the post-processor to find errors and make corrections to a text. It works like a spellchecker, highlighting each suspected error and suggesting a likely correction. It scans for over a thousand types of errors, including stealth scannos, missing accents, and punctuation errors. It also includes a quick non-interactive mode (useful for PPV) that merely lists lines of text containing suspected errors. It is part of the Windows-only GutWrench suite of tools.
Using GutAxe
The e-book file must reside in the same Windows directory (folder) as the GutAxe.exe file and sub-directory GAplugins, which contains the 5 GutAxe auxiliary files: GAaccent.txt, GAaelig.txt, GAscanno.txt, GAmarkup.txt, and GAothers.txt. The e-book file must have a text-file suffix, of the form *.txt.
To begin, select the e-book text from the input-file menu at upper left. Then select which operations you would like to perform from among the check-box choices.
For Post-processing, use the default settings:
- All Corrections turned on (in the menu bar.
- "Check text only" turned off.
- New users may want to turn off Expert Mode.
For Post-Processing Verification (PPV), use these settings:
- "Check text only" turned on.
- "Expert mode" turned on (default).
In using it for PPV, where errors are expected to be few, I recommend just reading (e.g., in Notepad) through GutAxe's output file GAerrors.txt. From there, any errors can be copied into the PPV Report Card for giving feedback to the PPer. Note, however, that to reduce the amount of output, GutAxe reports only the first suspected error on each line of text; thus the description of that error may not reflect the nature of the actual error. It is a good idea to include a brief explanation of the actual error after the line of GutAxe output.
"Go" button
Before pressing Go, you might want to expand the GutAxe window; this expands the large text box on the right, where the original text is displayed.
The Go function searches the text for (highly) likely errors and displays each in its context with a suggested correction. The popup dialog box displays the current rule and the current line of text after the suggested correction and offers the following choices:
- Change accepts the suggested correction
- Ignore rejects it
- Ignore All reject it and stops checking for the current suggested change for the rest of the text
- Abort stops the entire process but saves all the changes up to that point
This function produces an output text file, named GAout.txt, with the accepted changes. The input file is never modified. The accepted and rejected changes are also logged in file GAlog.txt; they are also tabulated by Page Number (provided DP-style Page Separators are present in the file).
If Check Text Only is checked, possible errors are displayed in the field at the right of the main GutAxe window; these are also recorded in an output text file, named GAerrors.txt (unless Record Mode is turned off). This is for information only, and it is not necessary to check the text before modifying it (but see When to Use GutAxe below).
If GutAxe finds an actual error in the text but does not offer the appropriate correction, make a note of the line number and correct it by hand later. Or, after completing the text-modification pass, re-run it with Check Text Only checked to find that error again.
A useful feature for long texts is the Start at line no. __ textbox. If you must Abort a GutAxe session, make a note of how many lines you have completed and use that for the starting point when you resume GutAxe-ing.
Menus
The Modes menu in the main menu bar offers the following features:
- Expert Mode (Default = On) requests fewer and briefer messages.
- Silent Mode requests fewer warning beeps.
- Record Mode causes error messages to be saved in external text files. The default is Record Mode = On; turning it Off prevents these files from being created or written over.
This menu selects which changes GutAxe will try to make to the input file.
For best results, set all selections to Checked (default).
Use of "ae" ligatures is optional. This menu provides control of whether GutAxe tries to correct these and how.
- Don't check ae's: no occurrences of "ae" will be flagged for possible changing to a ligature.
- Use plug-in file: only strings defined in the file GAaelig.txt will be searched for in the file (this is the default setting).
- Check all ae's: every occurrence of the letter combination "ae" (lower- and upper-case) will be referred to the user for possible changing to the appropriate lower- or upper-case ligature.
The plug-in files
GutAxe is accompanied by five auxiliary files, GAaccent.txt, GAaelig.txt, GAscanno.txt, GAmarkup.txt, and GAothers.txt, which are contained in the folder GAplugins, which, in turn, must reside in the same folder as the GutAxe.exe application. Each plug-in file contains a list of potential corrections or "rules", in this format: search type (as string, word, or regular expression), the text (error) to be searched for, the likely correction, and a description (each separated by a delimiting character, such as a backslash "\"). Errors are searched for as "words", as "strings", or as "regular expressions", as defined by the first field of each line of any of these files: "s" = string, "w" = word, "r" = regular expression. Lines in the files that begin with any other character (such as a blank, as in an indented line) are ignored (thus available for internal comments).
The difference among the search types is the way that GutAxe uses them. "Word" errors are searched for as entire words, surrounded by blanks or punctuation. Thus, in applying the rule "w\hut\but\h/b confusion", GutAxe searches for the word "hut", standing alone, but not for "Hut" or the string "hut" within another word, such as "chute"; when it does find the word "hut", it suggests "but" as its likely correction and gives "h/b confusion" as the description.
"String" errors, on the other hand, are searched for as character strings. For the rule "s\hotel\hôtel\circumflex", GutAxe finds not only "hotel" but also "hotelier", and suggests "hôtel" as the replacement for the "hotel" within "hotelier" to make "hôtelier". This approach allows a single rule to handle compound, feminine, plural, capitalized, and other variants of a base word or part-word.
"Regular expressions" are also searched, indicated by a leading "r" in the rule. For more details on these, see the DP forum or the wikipedia article. Since the backslash "\" often appears in regular expressions, GutAxe uses a flexible system for establishing the delimiter between the several fields; specifically, it takes the second character of the rule as the delimiter (for the entire rule). Convenient choices for delimiters include the low-line character "_" and the tilde "~". Thus the rule "r_tbi_thi_b/h confusion" converts every occurrence of the string "tbi" to "thi", because of "b/h confusion". The advantage of regular expressions is their power to handle a large variety of possible errors with a single rule.
When to use GutAxe
GutAxe may be used to correct a text at any stage of pre- or post-processing. Since it is designed to return a low percentage of false positives and to quickly and efficiently make corrections, it is recommended that it be run early on during either of these stages. (GutAxe recognizes DP-style Page Separators and does not try to make changes to them.) A good plan to begin post-processing is as follows:
- Run GutSweeper with all options selected: first Check Text and make corrections by hand until it finds no more Errors; then Modify Text.
- Run GutAxe with "Check Text Only" checked. Make a mental or written note of the kinds of errors it reports. (It also saves these in a file, GAerrors.txt.)
- First editing pass: As you do this, decide which of GutAxe's reported errors (especially hyphenation and spelling variants, such as "today/to-day" and "Caesar/Cæsar") should be accepted.
- Run GutAxe again with "Check Text Only" unchecked.
- Continue post-processing ....
However, GutAxe's "Check Text Only" feature is purely optional. It is quite safe to run GutAxe to make corrections without first checking the text.
See also
- GutCutter, another of the GutWrench suite, includes a feature that checks for the same suspected errors, which it displays with highlighting in an HTML document that it creates. This can quickly be visually scanned to pick out any actual errors.