PPTools/Guiguts/Guiguts Manual/Guiguts Sandbox 2
Guiguts Manual Sandbox page #2 (currently: Tools Menu)
The Tools Menu
The Tools menu is used to find and fix errors in the early stages of post-processing, and to help arrange the text into a standard layout that can be used later on to prepare both the Plain Text and HTML versions of the book. About half of the items on this menu lead to dialogs, some of which, such as Stealth Scannos and Footnote Fixup, may take considerable time to complete. The other options run immediately, and generally take just a few seconds to finish, by themselves.
The explanations below are in the sequence shown on the menu itself. The menu contains many topics, so this is a long page, and it's probably best to use the Table of Contents to go directly to each topic.
Use Tools>Word Frequency or Function key 5 F5 to prepare a report on all words in the book and then look for various kinds of possible errors and inconsistencies. If the file has been modified when you start Word Frequency, Guiguts saves it, then builds an index of all "words" (all pieces of text set off by white space, including numbers and abbreviations) in the book. This can take several seconds. When the index is complete, it is presented in a report window:
After you have run the Word Frequency routine at least once, the number of occurrences of a word is displayed when you search for a whole word or when you run spellcheck.
Whenever you have edited the document you can click the Re Run button to take a new word-census.
Using the Report Window
The body of the window contains a list of all words with their counts. The list can be sorted alphabetically (ASCII order), or by frequency of use (most-frequent words first), or by word-length (longest words first). To change the sort order, click one of the three radio-buttons Alph, Frq, or Len. Then click the All Words button, which causes the list to be resorted.
Initially the list respects letter case ("It" and "it" are different). To change from respecting case to ignoring case, change the Ignore Case switch and click Re Run to rerun the census. (Merely resorting with All Words will not make this change.)
When you double-click a word in the list, Guiguts searches for the first or next occurrence of that word in the document and scrolls to it. Keep double-clicking the word to scan all uses of it. Right-click a word in the list (Mac: control-click) to load that word into the Search Text field of the Search & Replace dialog.
In some displays Guiguts identifies "suspects," items that might be errors. These are marked with four asterisks. The Suspects Only switch causes the display to show only suspects, and using it may produce shorter lists in some cases.
Using Report Actions
The Word Frequency window offers 20 buttons, each giving a different way to process and display the data.
Starting in the upper-right, just after the Len sorting button, they are:
|One word must be highlighted in the list. The first-harmonic list for that word pops up in a separate window discussed below.|
|2nd Harm||One word must be highlighted in the list. The second-harmonic list for that word pops up in a separate window discussed below.|
|Re Run||Reruns the indexing and sorting process applying the current Ignore Case and Sort Alpha switch settings. Use this to update the list after you have edited the document, or to change the Ignore Case setting.|
|Emdashes||Displays all phrases that include an emdash (two hyphens). If an identical phrase having only a single hyphen exists, it is displayed as a suspect.|
|Hyphens||Displays all hyphenated phrases. A word that duplicates a hyphenated phrase ("after-thought" and "afterthought"), or a pair of words connected by an em-dash ("after--thought") is displayed as a suspect. Use to find inconsistent hyphenation of words, particularly at ends of lines.|
|Alpha/num||Displays all words and hyphenated phrases that contain a mix of alphabetic and numeric characters. Use to find one/ell and oh/zero errors.|
|All Words||Re-sorts the current word list based on the Alph/Frq/Len sort-order switch and displays the full list. Also used to return to the full list after viewing a subset such as Character Cnts.|
|Check Spelling||Let Spell Query examine the wordlist and display unknown words as discussed below.|
|ALL CAPS||Displays all words and hyphenated phrases spelled entirely in capital letters.|
|MiXeD CasE||Displays all words and hyphenated phrases that include both a lowercase and a capital letter in the non-initial position. Use to find OCR errors that mis-capitalize c/C, o/O, s/S, u/U, v/V.|
|Initial Caps||Displays all words and hyphenated phrases that start with a single capital letter.|
|Character Cnts||Counts all character values in the document and displays the list. If Sort Alpha is checked, the list is sorted by character; otherwise it is sorted by count, most-used first. Used to check for non-ASCII character use and for equal counts of matching brackets and parens.|
|Check , Upper||Displays all the times an uppercase letter follows a comma. Use to find the common error of comma replacing period. (One stealth-scanno search also visits these.)|
|Check . Lower||Displays all the times a lowercase letter follows a period. Use to find the common error of period replacing comma. (One stealth-scanno search also visits these.)|
|Check Accents||Displays all words that include an accented character or other characters in any of the Latin Unicode ranges, such as the ae ligature. A word that is the same except for the special character is displayed as a suspect. Use to check for inconsistent use of accents and ligatures.|
|Unicode > FF||Displays all words that include a character from the Unicode sets beyond the Latin-1 set (numerically greater than 255, hex FF). When such words exist, the file is saved as a Unicode file with two bytes per character. (Does not display Unicode or Latin-1 letters that are punctuation or standing alone.)|
|Stealtho Check||A different way that is discussed below to apply the same files as used by the Stealth scannos searches.|
|Ligatures||This lists all words containing ligatures (œ Œ æ Æ) and words that might contain ligatures (e.g., "Caesar"). You may want words to be spelled consistently, even though the books didn't always manage to do so.|
|RegExp-->||You can construct your own regular expressions in the box just to the right of this, then click the button to list matching words.|
When you request a First- or Second-harmonic search, the word list is searched for any words that can be made from the highlighted word by a single change (first harmonic) or by two changes (second harmonic). A "change" is an insertion, or a deletion, or a replacement. For example, the first harmonic of Footnote includes likely misspellings such as Foonote and Footnot. The second harmonic would reveal Footenot, a deletion plus an insertion. If any of the possible words exist in the word list, the original word and its relatives are displayed:
You can use the Word harmonic window the same as you use the main Word Frequency window: double-click a word to scroll the main document to display the next occurrence of that word; and right-click (ctrl-click) the word to load it in the Search window.
As long as the Word harmonics window has the keyboard focus (visible by its title bar being darker or emphasized), when you use an up- or down-arrow key, the Word harmonics window is updated to show the harmonic of the next word higher or lower in the Word Frequency window. Thus you can step through the harmonics of words in sequence.
When you click Check Spelling, Guiguts invokes the Spell Query program to scan the list of words. The words that are not recognized by Spell Query are displayed alphabetically in the report window.
Double-click a word to see its next use in the document. You can also apply first- and second-harmonic searches to words in this list, showing related words from the full list.
You may find this to be a faster way to perform spellcheck than the usual process, see below which steps through the document in document sequence. However, in the harmonic word search method you cannot change dictionaries or add words into the project dictionary or the Spell Query global dictionary.
Click All Words to return to the complete word list.
Fast Scanno Check
When you click Stealtho Check, Guiguts presents a file-open dialog. Browse to select one of the scanno files. Guiguts applies all the searches from the file you select against the index of words, and displays a list of the words that matched, with their counts.
Double-click a word to see its next use in the document. You may find this a faster way to perform scanno checks than the full Stealth scannos tool, which searches for each candidate in turn.
Saving the Report
You can save any word frequency report in either of two forms. With the report window active, key ctrl-s. A standard file-save dialog opens with the suggested name of wordfreq.txt. Click Save to make a file that is a duplicate of the displayed report, including the counts. You can also key ctrl-x (for eXport). A file-save dialog opens with the suggested name of wordlist.txt. Click Save to make a file that contains only the list of words without its counts, "suspect" flags, etc.
Why would you save a list of words? One reason is to use the list as input to the automatic word highlighting function. You could export the list of words that fail spellcheck, for example, and have them all nicely highlighted in purple as you scan the document.
Using this option is an essential part of post-processing, and you probably will use it at least twice while preparing the text: once at the end of preparing the "common" text that will be used by both the Plain Text and HTML versions, and again when (you think) the Plain Text version is done.
Bookloupe is an updated version of and replacement for the original Gutcheck program. Both will scan a text file looking for many common proofreading errors that the other tools do not find. When your finished project is submitted for publication to Project Gutenberg, it will be checked in several ways; one of them is with Bookloupe.
Unlike Gutcheck, Bookloupe supports UTF-8; it has some additional capabilities and bug fixes. Bookloupe is included in the Guiguts distribution package, and is what is described here.
When you click Bookloupe on the Tools menu, or BL on the toolbar, or press F6, Guiguts saves the current document; calls Bookloupe to read the saved file; collects its output; and displays the Bookloupe output in a report window. You inspect its findings in the report window, click on each one to jump to the referenced point in the document window, and optionally right-click to hide each resolved or ignored one from the list and go to the next one.
The first few lines of the report are a summary; the line-by-line diagnostics follow. To see the questionable text in context, click the message in the report window, and the document window will scroll to the text and highlight the relevant text.
(Note: using X-windows under Mac OS X, you may have to click at least three possibly four times. Remove this note if this gets fixed. Note to note: This paragraph was written many years ago, and may be obsolete.)
The insertion point moves to the diagnosed point in the line, if a specific column is mentioned, and the document window becomes the active keyboard focus.
Bookloupe 'View' Options
The diagnostics list easily can be overwhelmingly long, and, because it's in line-sequence (as shown above), the 50+ kinds of diagnostics are all jumbled together, making what should be an invaluable resource difficult to use.
The solution is to click the Bookloupe View Options button, which will show this [imposing] list:
At the bottom of the dialog, click Hide All. Every option will be selected to "Hide" and the diagnostics list will disappear. Now, click the first option in the View Options dialog to unhide its items in the diagnostics list; in many cases, there won't be any, and the diagnostics area won't change. Continue unhiding the options, one at a time, until some messages do appear. Then, click the first one and the document window will scroll to that line and highlight the possible error.
In this example, there were zero or one possible errors until Extra period was unhidden. Then, the diagnostics list looked like this:
Six things, all of the same type, are a lot easier to resolve than the 719 of all types in the See All list.
If it is an error, correct it; otherwise, right-click the diagnostic line to dismiss it and move to the next (if any) diagnostic within the same option. When there are no more of them, unhide the next option, and repeat this until you've worked your way through the entire View Options list.
You can dismiss all identical report lines ("identical" other than the line:column number) at once by using ctrl+shift-right-click, but this may result in missing an actual error. An example of a safe dismiss-all is:
233:45 - Non-ASCII character 232 366:1 - Non-ASCII character 244 382:4 - Non-ASCII character 232
Character 232 is è, so if you know the text contains French words, dismissing all of those report lines at once probably is safe to do. Note that report lines referencing other non-ASCII characters, e.g., 244, will not be dismissed when you do this.
Some options won't be relevant at the stage you're currently processing, so you can just skip them. For example, until you have re-wrapped the Plain Text, there may be long or short lines that won't be long or short later on.
The Hide all and See all buttons set or clear all switches. The Toggle View button inverts all switches: ons become offs and vice-versa. You can use the Save As Defaults button to save the present set of switches as your default settings, and will be used from now on, until you save a different set. If you make some temporary changes, you can use the Load Defaults button to return to the last-saved setup.
This corrects many kinds of minor errors. Sometimes, it will "correct" things that weren't wrong in the first place, so use it judiciously. For example, "Fix up spaces around single hyphens" and "Format ellipses correctly" are turned off by default as of version 1.3.2; if you turn them on, be sure to check the results. As with everything else on the Tools menu, be sure to save a good copy of your document before using this option.
Once you're satisfied with the selected choices, click Go!. Afterwards, you may want to review the results before continuing. A program such as Notepad++ (Windows Shareware) can compare the before-and-after copies of your document and highlight any differences it finds. Another excellent file-compare utility, ppcomp, written by a member of the DP Community, is part of the ppworkbench; it's also available directly, here.
Check Orphaned Brackets
This opens a small dialog with choices of several types of normally-balanced brackets, markups, and quotation marks (but not all quotation marks should be balanced, so use those choices cautiously). For each of these choices, the presence of one marker without its balancing marker is probably an error. For example, a common scan error is to read a right parenthesis as a right curly-brace. This is hard for the human eye to pick up, but the search for orphan parentheses (or orphaned curly braces) finds it easily.
Click a type of markup, for example /* */, and click Search. Guiguts scans the document for all opening and closing markups of this type (a process that can take many seconds, for a common markup in a large document). It finds the first instance of an opening mark that is missing its close, or a closing mark missing its opening. The unbalanced markup is highlighted with search-orange. Click Next to find another of the same type.
All of the topics on this sub-menu relate to Unicode, so most of them are explained together on the Unicode Menu page. Links to the individual topics are shown below.
Convert Windows CP 1252 characters to Unicode
This is useful only when a proofreader has used Latin-1 characters in the 128-159 range. We do not allow that, as explained here, so if you discover such characters, this option will help correct such errors.
Search for Transliterations
This opens a Search & Replace dialog pre-set to look for left brackets not followed by F, I, S, or a digit. It also will find Notes [**] and text legitimately beginning with brackets, so use it after you've resolved all of the proofer's notes, and each item this finds will need to be checked separately.
Commonly-Used Characters Chart
Unicode Character Entry
Unicode Character Search
Compose SequenceCompose Sequences on the Help Menu for a list of such sequences, and the Preferences->Processing->Set Compose Key Dialog for information on changing the key if it conflicts with other settings on your computer.
If the character you want is not in the list and/or you know their hex or decimal values, you can compose and insert Unicode characters on-the-fly by pressing and releasing the Compose key to display the Compose dialog, then typing into it the hex value (or #decimal value) and clicking OK or pressing Enter. If you type a 4-character hex value, 'Compose' will insert it without waiting for you to click OK.
For example, if you type: /a into the Compose key dialog, an á will appear at the cursor in the main Guiguts window and the Compose Dialog will disappear; 2720 will insert a Maltese Cross: ✠, as will #10016 and clicking OK.
You can enter Greek characters in two similar ways with the Compose key, once the Dialog is on the screen:
- type an equals sign, then in sequence as needed, the breathing, accent, and subscript marks, and finally the Roman letter (e.g., 'a') corresponding to the Greek letter. Typing a letter key tells Guiguts to convert the sequence to Polytonic Greek, so the Enter key is not used. For example, =)^|a inserts ᾆ;
- type a minus sign, then the Roman letter (e.g., 'a') corresponding to the Greek letter, then in sequence as needed, the breathing, accent, and subscript marks, and then press the Enter key (or click OK) to tell Guiguts to convert the sequence to Polytonic Greek. This corresponds to Beta code. So, -a)^| and pressing Enter (or clicking OK) also inserts ᾆ.
When processing our books, you may encounter symbols that are not available through the Compose Sequence feature, and some that are not in the extensive Unicode character set. Another way that may help you create such symbols is by combining existing normal characters with "combining" ones that are in the "Combining" blocks on the Unicode Menu. See Combining Characters for a clear explanation of how this works. Combining characters must follow the base character.
Normalize Selected Characters
Many Unicode characters used in our books consist of a base character with one or more accents or other diacritical marks, for example, å ('a' with ring above). There are sometimes two ways of creating such a character: either via a single Unicode character or via a base character followed by one or more combining characters. In the above example, there is a single "canonical composed" or "precomposed" Unicode character (Latin Small Letter A with Ring Above - decimal 229 / hex E5) that can be used. Alternatively, this can also be represented in a "decomposed" manner using a standard 'a' (Latin Small Letter A - decimal 97 / hex 61) followed by a combining character (Combining Ring Above - decimal 778 / hex 03EA). Depending on the font you are using, these may look identical to one another.
However, there are good reasons to use the precomposed form if one exists - a precomposed form generally exists for more common combinations, but not necessarily for less common ones. The first reason is that if you have used the decomposed form, then if a reader wants to find the word and uses the precomposed characters in their search, they may not find the version with decomposed characters, depending on their browser or e-book reader (some tools may use normalization during searching to allow them to find both forms). The second reason is that the Nu HTML Checker used by Guiguts and Project Gutenberg will issue warnings about decomposed characters if an equivalent precomposed Unicode character exists.
Converting from the decomposed to precomposed form is referred to as converting to Unicode Normalization Form C, or less formally normalizing the text. This process also includes ensuring any remaining combining characters after normalization are placed in a standard order.
If you see such an error from the Nu HTML Checker, typically something like "Text run is not in Unicode Normalization Form C", or if you have been using combining characters to add accents to letters, you should check the text is normalized before submitting it. You can select the portion of text that includes the characters, then use the "Normalize Selected Characters" menu option to resolve it. In fact, it is generally safe to select the whole document and normalize it (see technical note below for a potential exception).
Technical note on Greek oxia/tonos
There has been some debate over the normalization of accented Greek characters. For reasons related to modern Greek spelling reform in the 1980s, sixteen characters from the basic Greek character set appear to be duplicated in the Extended Greek set. All of them have what looks like an acute accent, called "tonos", e.g. 'ά' in the basic Greek set and "oxia", e.g. 'ά' in the extended Greek set. The debate revolves around whether these two are equivalent or whether the distinction should be preserved. The Unicode Consortium has decided they are equivalent and hence that vowels with oxia should be normalized to the vowel with tonos. Since 2016, the vowel+oxia combinations have been formally deprecated and removed from the Greek extended range. In summary, if you have strong views on the preservation of oxia, you may not want to normalize certain Greek letters, but you run the risk of your book failing the Nu HTML Check and may be working contrary to the Unicode recommendations. You could discuss this further with a Greek expert in the forums.
Find and Convert Greek
This looks for the next occurrence of [Greek:, highlights it if found, and opens the Greek Transliteration Tool with the Latin-1 characters pasted into the conversion area. You can transliterate the Latin-1 characters used by the proofreaders into Greek characters in a variety of ways:
- Unicode>The Greek Transliteration Tool
- the two Greek dialogs on the Unicode Menu
- see the Typing Greek Wiki article
- ask for assistance in the Help: Greek Forum
- use an online Greek transliterator, such as this one (but it may not always be there)
Spellcheck is best applied towards the end of the first phase of post-processing, after removing page separators, running Jeebies, Word Frequency, and checking for Stealth scannos, as these steps remove many trivial mistakes that would turn up as spelling errors.
Overview of Guiguts' Spelling Checkers
Beginning with version 1.5.0, Guiguts supports two different spelling checkers. One uses the GNU General Public License program Aspell 0.5 and the other uses a newer method, Spell Query, developed at DP and using publicly available dictionaries. Variants of Aspell may be invoked within Guiguts in either of two ways: "Spell Check" on the Tools Menu (shortcuts: F7 or the check mark on Guiguts' Tools bar), or "Spell Check in Multiple Languages", also on the Tools Menu. Spell Query is on the Tools Menu (shortcut: Shift+F7), and also as the "Check Spelling" button in Word Frequency. Each of these four ways of checking spelling has its own particular advantages and limitations, as covered in their explanations.
Some advantages of Spell Query are that:
- it supports UTF-8 (Unicode), while Aspell 0.5 does not, and Aspell 0.6 is not available for Windows
- it combines most of the capabilities of both variants of Aspell in one place
- it is simpler to use
- it is included with Guiguts, so you do not need to install it. To use Aspell 0.5, if you don't already have it, you must install it and install the language dictionaries you wish to use.
- three language dictionaries for Spell Query (English [combines British and American], French, and German) are included with Guiguts, so you don't have to install them. Their names are dict_xx_default.txt (where xx is the language code: en, fr, or de) and they are in the Guiguts data folder.
- To add additional dictionaries, see Dictionaries Used by Spell Query, below.
Some advantages of Aspell are that:
- dictionaries in many languages are available just by downloading them
- the primary variant (Spell Check) can suggest other spellings for flagged words
- Aspell 0.6, which does support UTF-8, is available for MacIntosh and Unix systems
- people who have been using Guiguts for a long time are familiar with Aspell.
Starting Spell Check
The following process is the traditional spellcheck (Aspell), in which you check words in document sequence. For a different, and possibly faster spellcheck process using first- and second-harmonic word searches, see Fast Spellcheck.
When you are ready to begin spellchecking, click Tools>Spell Check, press F7 or click in the toolbar. Guiguts saves the document and invokes the Aspell program to spellcheck the document. (At this time (2020) Guiguts is only compatible with Aspell version 0.5.x, which does not support Unicode.)
When Aspell completes, Guiguts opens the spellcheck dialog:
Adding the Good Words List
The number of unrecognized words will appear in the dialog's Title bar; this may be an intimidatingly-large number, but the proofreaders usually will have tamed it for you. Most projects include a Good Words List, so click Add Goodwords to Proj. Dic, confirm the action when asked, and then restart the Spell checker. A somewhat shorter list of possibly mis-spelled words will appear. (This is why Word check is required in the proofing rounds.)
Using the List of Unknown Words
The word in the top field (here, Firemen's) is a word in the document that is not found in the Aspell dictionary or project dictionary. Just above is the count of how many times this word appears in the document, and in the dialog's Title bar is the total number of remaining words to be checked. The first or only use of the current word is highlighted in search-orange in the document window, so you can see it in context.
The text in the second field (here, Fireman's) is Aspell's best guess as to a correct spelling. This text will replace the found word if you click the Change or Change All button. Below it is a list of other close matches from the dictionaries. You can move any of these to the Replacement Text field by double-clicking it.
Examine the word in context and decide what to do. Possibilities include:
The word is an error. For example, Firemen's was an uncaught scanno for Fireman's. You may have to look at the page image to make sure of the author's intent and the proper correction. Put the correct spelling in the second field—in the example, double-click Fireman's in the list. If the word appears only once, or if it might be correct in another context, click Change. The word is replaced and the next suspect is displayed. If the word appears more than once and would always be wrong, click Change All. (It's best to change the word using the Change buttons rather than by directly editing it in the document.)
The word is a valid English word. If you think the word is valid, check a dictionary (for example, there's a link to some dictionaries on the Custom menu, or you can try Merriam-Webster Online; on the Mac, you have a comprehensive dictionary built into your "dashboard.") If the word is valid, you can click Add To Aspell Dic, but be cautious about this, because once a word has been added to the Aspell Dictionary, it never will be flagged again, in any future project you do.
The word is valid in its context. Aspell questions proper nouns, archaic spellings, technical terms, and words from languages other than one for the dictionary in use. There are two possibilities. If you are sure the word is valid everywhere it appears in the book, click either Skip All or Add To Project Dic (which in fact have the identical effect, and are much safer to use than Add to Aspell Dic). If you are sure the word is valid in its visible context, but it conceivably might be invalid somewhere else, click Skip (just this one occurrence).
For example, take a book that has occasional references to French writers and text. When Aspell stops on the name Rochefoucauld—and after you have really looked to make sure it wasn't mis-scanned as Rochcfoucauld—you can confidently click Add to Project Dic. On the other hand, if Aspell stops on sur or sa in the context of a quotation from the French, you should only click Skip. The same letters, in the context of English text, could be scannos for sun or so, and you want to view every occurrence.
You aren't sure. Click Skip. Run spellcheck again later and the same word will come up again.
Usage hint: With practice, you can decide what to do with a word very quickly, but you must not let yourself decide too quickly. If the book contains many proper nouns, latin tags, archaisms, etc., you may start clicking Skip All so fast you click right past a real error. You must really look at every word Aspell presents. (Scannos can appear in latin or archaic terms, too!)
"Not in Dictionary" numbers
Up to four numbers may appear above the current word:
- 1 number is just the number of occurrences (exact matches)
- 2 numbers mean number of occurrences and number of occurrences as part of a hyphenated word, e.g. "abc" twice and "abc-d" once would say "2, 1 hyphens in text"
- 3 numbers mean number of occurrences, number of occurrences with different case, e.g. "Abc", and number of possessives, e.g. "Abc's"
- 4 numbers mean occurrences, case, possessive, hyphen
The following key-equivalents are available while the spellcheck window has the keyboard focus:
|ctrl-a||Add word to Aspell dictionary.|
|ctrl-p||Add word to project dictionary.|
|ctrl-i||Skip All ("ignore").|
Stopping and Restarting
If you need to take a break from spellcheck and start again later, you can do so. You can quit spellcheck at any time; just close the dialog window. When you later restart spellcheck it resumes with the first uncorrected word. Words you have skipped (but not Skipped All) will again be found as errors and you will have to skip them again.
You can also click Set Bookmark before closing the dialog window. When you next open spellcheck in the same document, click Resume @ Bookmark. Checking begins at the bookmark, and you do not have to skip over the same skipped words.
Checking Part of the Document
To check only part of the document, select that part before you open the spellcheck dialog. Checking is confined to the selection. After checking has started, click in the document to clear the selection so you can see the found words.
Aspell and Unicode Text
Aspell does not properly handle Unicode characters. If your book contains Greek or Cyrillic or other UTF-8 letters, Aspell will work properly on the Latin-1 text. However, when it reaches a word containing non-Latin-1 letters, it will detect it as an error, but will not display the word, nor is the word highlighted in the document window. When Aspell stops on such an "invisible" error, just keep clicking Skip until it emerges from the stretch of Unicode text and highlights a normal error again.
Changing the Main Dictionary
Aspell is usually installed with more than one main dictionary, including at least three for variants of English. (A book published in England will produce many fewer errors when checked with the EN_GB dictionary than with the EN_US one!) To use a different dictionary, click the Options button in the spellcheck window. This opens a dialog that lists all available dictionaries.
Double-click one to move it to the Current Dictionary field and click Close. The spellcheck process is restarted with the new dictionary. (Caution: the restart can take several seconds, during which Guiguts appears to be hung.)
The same dialog can be used to locate a different executable program for Aspell (normally you set the path to the executable using the Preferences menu).
A set of four buttons at the bottom of the dialog specify how generous Aspell should be in selecting similar words to display as possible replacements. The Ultra Fast setting will suggest only a few, very similar words. "Bad Speller" will suggest many words.
Using the Project Dictionary
Because of the way that Guiguts implements spellcheck, there is no practical difference between the "Skip All" and "Add To Project Dic" buttons. Both result in adding the current word to the project dictionary, so that it will not be shown if you run spellcheck again.
The Project Dictionary is a text file that Guiguts writes in the same folder as the document file. It has the same filename as the document and the suffix .dic. You can edit it, for example, to remove a word that you added in error. Or you can remove all words from the project dictionary by simply deleting the file.
Spell Check in Multiple Languages
Guiguts uses Aspell 0.50 to check spelling, as described above. Aspell can use more than one language-dictionary to check spelling. The dictionaries must already be in Aspell's dict folder. Guiguts only supports Aspell version 0.50. Although that version no longer is supported by its author, at the time this is being written (2020), it's still available here, and so are dictionaries that may not have been included with your copy of Guiguts. Unfortunately, there is no Latin dictionary for Aspell 0.5.
When you select this option, the following dialog appears:
Click the Help button for instructions on using this part of Aspell. For your convenience, those instructions are shown below. Please note that at the top, it says that it "ignores words from the project dictionary." This suggests that it won't use our "good words.txt" list, but there's a very easy workaround:
- before using this option, start the regular Tools>Spell Check (or F7 or ) and click Add Goodwords to Proj. Dic., and confirm the action
- optionally, complete the regular Spell Check, using Skip All or Add to Project Dic. to add even more words to the Project Dictionary.
Then, if the project merits multi-language spellchecking and the necessary dictionaries are available, this option will make use of the Project Dictionary you've just created.
- Set Base Language: Select the base language that is used.
- Set Languages: Select one or more foreign languages for additional spellchecking.
- (Re)create Wordlist: Identify all distinct words and word counts.
- Check spelling: Spellchecks in all selected languages. Note that some unicode words currently appear as spelt in the base language. Aspell currently does not handle these words correctly.
- Include project words: Amend unsplit words which occur in the project dictionary with the language tag 'user'. Note that dictionary files have now been given filenames so that two or more volumes labelled abc1.txt and abc2.txt in the same directory will share the same project dictionary abc.dic.
- Show all words: Shows all the distinct words together with their frequency, and if available, language in which they are correctly spelt.
- Show unspelt words: Shows all words which have not yet been spelt in any language nor are includd in the project dictionary.
- Show spelt foreign words: Shows all words that have been spelt, other than those in the base language.
- Show project dictionary: Shows all words in the project dictionary.
- Add foreign to project: Adds all words that have been correctly spelt in languages other than the base language to the project dictionary.
- Add frequent to project: Adds all words with a frequency more than or equal to the minimum frequency to the project dictionary.
- Save Debug Files: Use this to collect information that may be helpful in diagnosing errors.
This spell checker doesn't use Aspell, does support UTF-8, currently comes with three language dictionaries (English [combines British and American], French, and German; but you can add your own, as explained below), and is simpler to use than the original Aspell-based spell checker. Spell Query's operates similarly to most of the other tools on Guiguts' Tools Menu. It can be started from the Tools Menu or by pressing Shift+F7.
- Threshold (default is 3): Words that occur more often than this will not be included in the list
- Run Checks: Spell Query does this automatically when invoked, and displays a list of unknown words, their frequencies, and where they are in the text file (line and character offset).
- If you add the good-words list, Spell Query will re-run the Checks automatically.
- If you change the threshold, click Run Checks again to refresh the list.
- The currently-highlighted word in the list will be highlighted in the main Guiguts window, so you can correct it inline.
- Unlike the original spell checker, Spell Query doesn't offer suggestions or semi-automatic replacement, so you must do that yourself.
- Add good-words.txt: Adds the unusual words identified during Proofing to the Project Dictionary, so they won't appear in the list.
- Skip (keyboard shortcut: right-click the entry): Deletes the highlighted entry from the list and moves to the next entry.
- Does not add the word to the Project Dictionary, so, if you use "Run Checks" again, the word will appear in the refreshed list.
- Skip All (keyboard shortcut: ctrl-shift-right-click the entry): Deletes all queries about the current word from the list and moves to the next entry.
- Add to Project Dictionary (keyboard shortcut: ctrl-left-click the entry): Adds the current word to the Project Dictionary, removes every occurrence of it from the list of unknown words, and moves to the next entry in the list.
- If you use Run Checks again or rerun Spell Query on the same file, the word will not appear in the refreshed list.
- Although Spell Query does not offer the Set/Resume Bookmark feature found in Spell Check, adding every acceptable word to the Project Dictionary (rather than using Skip or Skip All) will give the same effect if you stop part-way through and restart Spell Query later on.
- Add to Global Dictionary (keyboard shortcut: ctrl-shift-left-click the entry): Adds the current word to Spell Query's permanent Global Dictionary, so it will not be flagged in future projects. Use this with caution.
- When more than one language code is in effect, this option will add words to the dictionary of the first one listed.
- Pop the Search and Replace Dialog by alt-left-clicking (or command-left-clicking on a Mac) the entry. The Search and Replace dialog will be popped with the queried spelling in the Search field, making it easy to search for other occurrences of the queried spelling.
Dictionaries Used by Spell Query
Lang on Guiguts' Status bar lists the language code(s) whose dictionary or dictionaries Spell Query should use when processing your document. If you specify more than one language, separate the language codes with spaces, plus signs, or commas (e.g., en,fr,de). The primary language code is the first one, and is the only one used by other Guiguts functions. The dictionaries may be in Guiguts' data folder, using names in the format: dict_xx_default.txt (where xx is a language code) or in the global GGPrefs folder, using names in the format: dict_xx_user.txt.
- Besides the dictionaries included with Guiguts, DP offers a few others, as explained HERE.
- If those are insufficient, see HERE for some ways to add language-specific dictionaries for use by Spell Query.
- You can add/modify this list in the Languages box on the Guiguts Status bar. Changes to the language code list will be saved in the .bin file when you next save your file. It is project-specific for all later iterations of the file, but will not be used with other projects.
- After making changes to the list, use Run Checks to tell Spell Query to use the specified dictionaries.
- Spell Query's support for multiple languages is similar to, and can replace, the Spell Check in Multiple Languages explained above.
For further information about possible additional Spell Query dictionaries, please see "SCOWL-README.txt" in the Guiguts data folder, or ask about it in DP's "Help with: Guiguts" Forum.
Scanno searching is automated searching for common OCR errors. Use Tools>Stealth Scannos, F8, or click the Arid button in the toolbar to start the process. Guiguts presents a standard file-open dialog:
(The very first time you ever use Stealth Scannos, you may have to find the "scannos" folder. It's in the main Guiguts folder.) Use this dialog to navigate to one of the files distributed with Guiguts, which include:
|en-commn.rc||Several dozen scannos often found in English text, such as "arid" for "and."|
|misspelled.rc||A file of about 3,400 literal scan errors that have been seen in DP projects.|
|regex.rc||A file with a few dozen sophisticated regular expressions designed to find common errors.|
Select the file to use and click Open. Using "regex.rc" and then "en-commn.rc" is a common pair of choices. "misspelled.rc" is a basic way of checking spelling, but the Spell Checker Guiguts normally uses (Aspell) is faster and more versatile. The other lists are for special purposes and are not discussed here.
Guiguts opens the Search dialog with additional controls visible and the checkbox-style options set for this Tool. This is the first item that happened to be found by "regex.rc" in the project file used to make this example:
Note that it's the fifth test, not the first, because Auto Advance is automatically set, and the first four tests didn't find anything: Examine the highlighted word or phrase to see if it is an OCR error. Correct it if necessary. Some of the scanno files set replacement text that will correct the error if you click Replace or R & S.
Click Nxt Occurrence or Search to find the next instance of this scanno. Continue clicking Next Occurrence until Guiguts can find no more of that scanno and scrolls to the top of the document. If you click too quickly past a likely error, set Reverse to back up, or Shift-click Search.
Click Next Stealtho to load the next item from the file and search for it. If you click Next Stealtho in error, use Prev Stealtho (possibly more than once) to return to a previous, bypassed item.
When Auto Advance is set, Guiguts will test each scanno in sequence and not stop until it finds one that actually appears in your document.
Note 1: If you click Reverse, Guiguts turns off Auto Advance, so to continue using it, you will need to set it again.
Note 2: The Word Frequency dialog offers Stealtho Check as a different way to search for these same scannos. This might be more useful for files such as misspelled.rc with many entries.
The scannos in some files have explanatory hints. Click the Hint button to possibly see an explanation of the current scanno. You may if you wish edit existing hints or add hints to scannos that do not have them. Click the Edit button to open a hint-editing dialog:
Use the arrow buttons to scroll through the scannos of the current file. If you modify the hint text, click Add to add the changes to the scanno file in memory. If you modify the search or replacement text, clicking Add creates a new entry; to replace an entry, back up to it and use Del to delete it.
These changes affect the loaded scanno file in memory. Only when you click Save is the scanno file on disk permanently updated.
Jeebies examines an English text trying to find scanning errors that have replaced be with he or vice versa. Such "scannos" are both common and hard to find.
When you start Jeebies (only from the Tools menu; it doesn't have a keyboard shortcut), Guiguts saves the document, then invokes Jeebies to read the saved file. Jeebies is CPU-intensive and may take several seconds to complete its scan of the program. When Jeebies completes, Guiguts displays its report in a separate window:
The report identifies lines where the use of he and be suggest possible errors. As with the Bookloupe report, you can left-click any line of the report to make the document scroll to that line, and right-click any line to remove it from the report.
- Ctrl+left-click will make the suggested change for you (change he to be or vice versa).
- Ctrl+right-click will make the suggested change and remove the suggestion from the report.
Most or all of the items Jeebies identifies will be correct, as modern image-to-text programs are very good and our proofreaders are even better.
The three radio buttons, "Paranoid", "Normal", and "Tolerant", control how sensitive Jeebies will be to possible errors. It's usually best to use "Paranoid", but if it reports an overwhelming number of possible errors, try one of the other options and click Re-run Jeebies to see if it finds any actual errors. If so, it's likely that more errors will be lurking in the "Paranoid" list.
Several tools listing possible errors show a running count of the remaining number of issues in their lists. You can see that value in the upper-left area of the above example.
For some history of Jeebies, see this page.
Tools such as Bookloupe produce a list whose entries you can click to see the referenced text in Guiguts' Main (document) window. Some online post-processing tools, such as pptext and ppcomp in the postprocessing workbench, can generate similar lists that you can save on your own computer and then use with Guiguts. Once you've used such a tool and saved its output, click Tools>Load Checkfile to see this dialog:
then click the Load Checkfile button and select the file you downloaded. (Guiguts may show you a standard file selection window without your having to click that button.) After the file loads into the list area, it'll look something like this example from pptext:
(The example is partway down the list, as the very top is just information about the list itself.)
You can make the dialog box wider and taller to make it easier to read the messages. When you want to examine the referenced text in context, double-click a message line; to hide a line, right-click it.
In pptext's online output, red text is used to highlight some words. Since Guiguts cannot display coloured words in that way in the error check box, these are shown with *asterisks* surrounding them, e.g. the suspect word *aa* in the example above. Larger header text in the output of tools is indicated using ***three asterisks***.
When used with ppcomp output, which compares two files, you will of course only have one of those files loaded in the main window. The order in which you select files for ppcomp to compare is therefore important. For example, if you use ppcomp to compare myfile.html with myfile.txt (i.e. html file first), then the first line numbers in the output from ppcomp will relate to the html file, so this checkfile is useful for making changes to the html file to match the text file. If you want to make edits to the text file to match the html file, then load the text file into Guiguts, and re-run ppcomp remembering to select the text as the first file and the html as the second.
In ppcomp's online output, red and green text is used to indicate whether words have been deleted or added from the first file chosen. Deleted words are surrounded with ###hash characters###, and inserted words are surrounded with >>>angle brackets<<<.
- clicking a report line moves the main text window to the line it references.
- right-click removes the suggestion from the list, moves to the next suggestion, and moves the main text window to the line referenced by that suggestion.
- Ctrl+left-click makes the change suggested by the query/error (for Jeebies, this swaps he/be; for OCRfixr it makes the suggested correction).
- Ctrl+right-click does the same, but also removes the query from the list.
- Ctrl+Shift+right-click discards all queries that are identical to the clicked one but on a different line number. This is to quickly get rid of multiple wrong suggestions. Note it does not remove all errors of that type, just the ones that match exactly, e.g. it can remove all occurrences of "Query digit in 4to", but retain other digit queries.
This option will help you find and resolve footnote-related errors such as mismatched anchors/footnotes, missing or duplicated footnotes, and missing closing square brackets that result in normal text being seeming to be part of footnotes. It'll also help you rejoin the segments of continuation footnotes into one complete footnote.
Where Notes Are Placed
The project manager might have instructed proofers to code inline notes, embedding [Footnote: etc] in the text. Out-of-line footnotes are more common: each note is proofed in two parts: an anchor (a symbol in square brackets) in the text, and the notes proper, which proofers batch at the end of each page. Since then, you have moved those notes to the ends of paragraphs, if necessary. Now you need to make an editorial decision: where will the notes be placed in the final etext? You have four choices.
Inline. The note text is embedded in the running text, like this [Footnote: _ibid_, p.222]. Inline notes are somewhat intrusive, but are appropriate when that is how the proofers left them, and when notes are few.
End of Paragraph. Each note is placed just below the paragraph that contains its anchor, or just below the block quote for which the note is a citation. This is appropriate when there are only a handful of notes per chapter, especially when notes are mostly brief citations.
End of Chapter. All the notes from one chapter are batched at the end of the chapter. Do this if the original book did so, or if footnotes are extremely numerous or verbose.
End of Book. All the notes are batched in a block at the end of the book. Do this if the original book did so, or if footnotes are moderately numerous.
The Footnote Fixup Dialog
This decision made, use Tools>Footnote Fixup or click FN on the Tool bar, to open the Footnotes dialog.
Click the button First Pass and wait while Guiguts scans the document to find every identifiable note and anchor. (During this first pass, besides locating footnotes, Guiguts also looks for and automatically corrects a number of common footnote typographical errors such as "[ Footnote" with a space and "[footnote".) When the scan ends the document is scrolled to display the first note, which is highlighted in aqua, and a list of all footnotes appears:
Errors are color-coded, highlighted, and counted at the top of the list.
How Notes and Anchors are Matched
Some books have distinct types of notes, some numeric and others alphabetic. Indeed, it is possible for a book to have three intertwined series of notes, numeric, Roman[IV] and alpha[C]. The numbers may restart on each page (or chapter). Some books use symbols (asterisks, daggers, etc.) and the proofers may have been inconsistent in how they coded them.
During the First Pass, Guiguts matches an anchor [ str ] to its note by searching for the next [Footnote str : in the text. For purposes of linking a note and its anchor, it doesn't matter what type of symbol is used; only that the two strings be identical and the footnote follow the anchor. That's why duplicate and inconsistent symbols don't matter at this time. After finding any anchor [Q] Guiguts looks for the next following [Footnote Q:.
Look through the report list for color-coded errors; double-click such lines to see the referenced text in Guiguts' Main (document) window. If you close this list, you can reopen it by clicking the Check Footnotes button. After making some changes, you can refresh the list by once again clicking First Pass.
If some of the footnotes are further away from their anchors than normal, Guiguts may not look far enough to find them. To prevent this from happening, select Unlimited Anchor Search and click First Pass again.
The report list identifies four kinds of problems: duplicate anchor, missing anchor, out of sequence anchors, and possible missing close-bracket.
The usual reason for Guiguts not connecting a note to its anchor is that the note is improperly coded. Some very typical errors include:
- Missing colon, [Footnote A Text...
- Period instead of colon, [Footnote A. Text...
- Comma instead of colon, [Footnote A, Text...
- Two colons, [Footnote: A: Text...
- Missing symbol, [Footnote: Text...
- Various subtle and hard-to-see misspellings of Footnote.
Use the Next FN and Last FN buttons to step through the notes. You can also pop-up a list of all notes using the button to the right of the Go to window. Scroll to the note you want and click it; the document jumps to show that note.
Verify from the aqua highlighting that each note is correctly bounded by square brackets. If a closing bracket is missing or misplaced, correct it. If the note is still embedded in a paragraph, move it between paragraphs. After either change, click Rescan this FN to rescan the current note, or First Pass to rescan all notes.
When the anchor and the note are not on the same screen, use the See Anchor and See Footnote buttons to bounce between them. If the distance between note and anchor is wider than you expect, look carefully. It is possible for a note to be mated with the wrong anchor. For example, if [Footnote 1 is mis-coded, say with a missing colon, it will be ignored during the first-pass scan. The anchor  will be mated with the next [Footnote 1 somewhere further along in the document.
If no anchor at all is highlighted, the syntax of either the note or the anchor is wrong (and the note should be marked in the Check Footnotes error report). Correct the note and click First Pass to rescan the notes.
If the note is correct but the anchor still is not found, the anchor may be missing or mis-coded. Look at the page image and find where the anchor should be. If the anchor is malformed, delete it. Place the cursor at the insertion point where the anchor should be and click the Set Anchor button. Guiguts inserts an anchor using the symbol from the current note.
If the anchor looks correct but still isn't found, set the Unlimited Anchor Search switch on. This allows Guiguts to scan farther ahead from an anchor looking for the note.
When all notes are correct as to syntax and type of symbol, compare the count of notes shown at the top of the dialog to the count of the word Footnote from the Word Frequency report. If a discrepancy leads to the discovery of a "lost" footnote, correct it and click First Pass again. Use the Go to # field to go quickly to the new note and check it.
"Out of Sequence, Footnotes Not In Same Sequence As Anchors" is a warning. It doesn't necessarily mean something is wrong. Sometimes footnotes have footnotes and will appear out of sequence to the program. The same with one footnote that ties to two different anchors. Ask in the forum if you need help making the best decision for your book.
Do not go on to further steps until there are no errors shown in the Check Footnotes report and you have verified the length and anchor of all notes. (As a veteran of multiple books having several hundred footnotes each, this writer can testify that there is simply no substitute for inspecting each footnote in sequence. Footnote syntax is complex and prone to subtle errors. Not all errors are displayed in the Check Footnotes report. You simply must verify proper scoping of every note. Any uncaught errors will cause chaos later on.)
If you want to use inline placement (or convert to it), begin by inspecting and correcting the notes as described above. If a proofer didn't get the message and placed a footnote out-of-line with an anchor, no matter; just make sure the anchor and note are correctly formatted.
When all notes are correct, make sure the Inline switch is active, then click Reindex. Guiguts checks all notes. Where it finds an out-of-line note, it moves the note to replace its anchor. Inline notes are now complete.
While inspecting out-of-line notes, make sure their symbol types are consistent. To change the symbol type of the current note only, click the Number, Letter, or Roman button. Guiguts changes the note and its anchor to use the next number, letter, or Roman numeral. Do not be concerned about duplicate symbols at this time.
Roman vs. Alphabetic Symbols
At this stage you must be aware of how Guiguts tells the difference between an alphabetic symbol [A] and a Roman symbol [I]. They both consist of alphabetic characters, so are ambiguous to program logic. The arbitrary rule is that a Roman anchor ends in a dot, while an alpha anchor does not. Thus [I.] is a Roman number, and [I] is alphabetic. The dot is also required in the note number, as in [Footnote I.: (Roman) versus [Footnote I: (alphabetic). Guiguts recognizes lowercase Roman with a dot, as in [iv.] but it only generates uppercase Roman.
If your book has roman anchors but the proofers did not include the dot (and why would they?) you must hope that the editors of the original work were careful enough to never use an ambiguous [i] or [v] footnote. Use regular expressions to find all the ought-to-be roman anchors and notes and add dots to them. For anchors, search for literal [, one or more roman lowercase letters, literal ] using the regular expression \[([ivxl]+)\] and replace with [$1.]. Similarly for finding the notes, use the regular expression Footnote ([ivxl]+): replacing with Footnote $1.:
Indexing Out-of-Line Notes
When all notes are correct, save the document. Again click First Pass. Then make sure the Out-of-line switch is active and click Reindex. Guiguts goes through all notes and gives them consecutive values of the type of symbol they now have (letter, number, or Roman). If you want to force all notes to use a common type of symbol, set the All to Number, All to Letter, or All to Roman switch and click Reindex again.
If your book uses Roman-style numbers in lowercase [iv.], the Reindex pass replaces them with uppercase Roman [IV.]. You can force these back to lowercase with two more regular expression search and replace operations, after completing the following step.
Placing Out-of-line Notes
To move notes to chapter-end or book-end you need to establish "landing zones" where Guiguts will collect the notes. A landing zone is simply a line containing only the word "FOOTNOTES:" and followed by a blank line. Footnotes preceding that line are moved to follow it.
You can use the Set LZ @ cursor button to insert a "FOOTNOTES:" line anywhere you chose in the file. Or you can use the Autoset Chap. LZ button to insert a "FOOTNOTES:" line preceding each chapter break (four blank lines). (Caution: in a large book this operation may take many seconds.) You can click Prev. LZ and Next LZ to move the document from one "FOOTNOTES:" line to the next.
When you have inserted landing zones where you want the notes to gather, click either Move FNs to Landing Zone(s) to move them to the ends of their Chapters or the end of the book; or Move FNs to Para to move each footnote to just below the paragraph containing its anchor. (The heading line "FOOTNOTES:" will not be added in this case.) Guiguts moves each note to the landing zone next below it in the document, leaving a blank line above it. Note that, except when using Move FNs to Para, Guiguts always moves footnotes downward toward a landing zone on a higher-numbered line. Even if a footnote is sitting directly below a landing zone, it will be moved to the next one down in the document. If you want a footnote to stay where it is, place a landing zone just below it, not just above it.
As a quality check, click First Pass again and examine the moved notes. You can use the Prev. LZ and Next LZ buttons to move from one block of notes to the next. Use the Go to # list or Prev. FN and Next FN buttons to step through the notes. Since the notes are now far removed from their anchors, use the See Anchor and See Footnote buttons to flip between footnote text and matching anchor.
If you used Autoset Chap. LZ, there may be unused Landing Zones at the ends of chapters with no footnotes. (Recall that Landing Zones are the word FOOTNOTES:). You probably will want to delete the unused Landing Zones. Similarly, if you moved all of the footnotes to the end of the document and later moved them to follow the paragraphs that referenced them, remember to delete FOOTNOTES: at the end of the document.
When everything looks correct, save the document.
Tidy Notes for the txt version
Later on, after you've done as much as possible with the "common" version of the text, and have saved a separate copy of it to use in preparing the HTML version, you can simplify the appearance of the footnotes for use in Plain Text. When you are ready to do so, start Tools>Footnote Fixup once more, click First Pass to scan all the footnotes and make sure no errors are shown. Then click Tidy Up Footnotes. Guiguts changes all the notes from the form [Footnote 1: Text...] to the form  Text.... This operation can take a very long time.
Caution: once you save the file in this form, any further editing or arranging of footnotes must be manual; the Footnote Fixup dialog will no longer work. Also, automatic HTML generation cannot recognize tidied footnotes, so this is for the .txt version only.
This looks for simple errors in [Sidenote:] markups (such as [sidenote:] or a missing blank line following the close of the markup), and corrects them. You will want to check them yourself, whether or not you use this option.
Replace [::] with Incremental Counter
There are situations in which you may find it advantageous to use consecutive numbers, for example, in the id's of illustrations in the HTML version of a project where it's impractical to use page numbers.
One way to insert consecutive numbers is with this option. Place the string [::] wherever you will want the next number in sequence (usually with other text around it), and then click Tools>Replace [::] with Incremental Counter. The first occurrence of [::] will be replaced by "1", the next by "2", and so forth.
As suggested above, a practical use of this option is where the id's of the illustrations are not the same as the page numbers in the book's List of Illustrations. The following set of steps is one way to make them match after you've used HTML>HTML Generator>Auto Illus Search (which helps you convert the <p>[Illustration: caption]</p> tags created by Autogenerate HTML to actual HTML that will display the appropriate illustrations and any captions):
- Save a good copy of the HTML document, because it is just barely possible that not everything will work the first time.
- Regex Search & Replace using Rpl All:
Search: <div id="(.+?)" class="fig Replace: <div id="ip_[::]" class="fig
- When it finishes, repeat the Search (but not the Replace), looking for illustrations that are not referenced in the List of Illustrations. If there are any, remove the id="ip_[::]" from them. You will want to have exactly as many of these id's as there are entries in the List of Illustrations, and for them to be in the same physical sequence. Since this example process only looks for Arabic-numeral page numbers in the List of Illustrations (see step below), the Frontispiece, if one is present in that list, will need to be done manually. This requires its placeholder tag to be changed to something like id="i_frontis" so that it will be excluded from the incremental counter step that comes next.
- Click Tools>Replace [::] with Incremental Counter.
- The first two occurrences (which will not necessarily be near each other) now will look like this:
<div id="ip_1" class="fig <div id="ip_2" class="fig
- Move to the List of Illustrations and do something similar:
Search: >(\d+?)< Replace: ><a href="#ip_[::]">$1</a><
- Select the entire List of Illustrations and do a Rpl All
- Click Tools>Replace [::] with Incremental Counter.
- On the HTML menu, check the results with the HTML Validator and then the Link Checker. Even if everything seems to be OK, it's possible that there will be some mismatched links, so check them by clicking them in the List of Illustrations to see where they lead. Find the ones that don't match, go back to the saved copy of the document, repeat this entire procedure, but with corrections for what didn't match, and try again.
This dialog converts fractions within your current selection to actual Unicode characters (if available) or to a mix of superscripts and subscripts, with a "⁄" (Fraction Slash) between them, simulating the appearance of fractions for which there are no Unicode equivalents. You have three conversion choices, as shown in this example:
Fixup Page Separators
This dialog makes it much easier to remove the -----Page Separators----- and rejoin words that were split across pages.
The page separators are useful to you in the early stages of post-proofing; they show you clearly the page units seen by proofers, the points at which they had to deal with hyphenated words, incomplete poems or italics or block quotes, and so forth. After the first pass through the book, however, they are not useful. When you have moved illustrations and footnotes outside of paragraphs, you can remove the page separator lines.
Use Tools>Fixup Page Separators to open a dialog dedicated to removing the page separator lines from the document:
After you have removed the page separator lines from the document, Guiguts still knows where the page boundaries are because it saves the information in the .bin file (see this page). You can still jump to a page, display a page image, or message the proofers for a given page.
Page separators appear within paragraphs, between paragraphs, above or below illustrations and footnotes, and between chapters. Each case needs different handling. When the separator is within a paragraph, there may be blank lines above and below it, and a hyphenated word or phrase might have crossed the page boundary. If the separator is within a poem, block quote, or table, there might be */* continuation-marks before and/or after the separator. The Fixup Page Separators dialog handles most of these issues automatically (but see Some Limitations below).
Suggestion: Before removing the separators, resolve and remove all proofer's notes, position all [Illustrations] between paragraphs, and run Footnote Fixup to rejoin continuation footnotes and perhaps move them to the end of the document. Fixup Page Separators is excellent, but not always perfect.
99% Auto almost always gives the best results, as well as being the quickest to use, and that's the default. (Earlier versions of this Manual recommended that first-time users set the "Auto" state (third line) to No Auto, which keeps Guiguts from scrolling after it makes a change, so you can view the results and reassure yourself they are correct. The earlier recommendation suggested that, as you learn to trust the tool and see its limitations, you can allow it ever more autonomy.)
Begin or continue the removal process by clicking Refresh. This scrolls to the first remaining page separator line and highlights it in search-orange. Examine the line and choose your action as follows:
- If the line is in the title page or a table, just click Delete. You will revisit such areas and adjust the spacing later.
- If the line precedes a chapter-level section (Preface, Contents, etc.), click New Chapter. Guiguts deletes the line and then ensures that there are exactly four blank lines between the preceding and following nonblank lines.
- If the line precedes a section-level section, click New Section. Guiguts deletes the line and then ensures that there are exactly two blank lines between the preceding and following nonblank lines.
- If the line falls between paragraphs (you may have to refer to the page images by clicking View Img in this Dialog or See Img in the main window's Status bar to be sure), click Blank Line. Guiguts deletes the line and then ensures that there is just one blank line between paragraphs.
- If the line falls within a paragraph, look for a hyphenated word above It. If so, decide if the word should be joined or left as a hyphenated phrase.
- To join the word, click Join Lines.
- To retain a hyphenated phrase, click Join, Keep Hyphen.
- In either case, Guiguts deletes the line and closes up the paragraph.
When you disagree with a change, use Undo to revert to the state just before the option you just used. The tool supports multiple Undo's.
If you select 80% Auto or 99% Auto and then click Refresh, Guiguts goes through the file processing all page separators where the type of join needed is (it thinks) unambiguous. It stops when it finds one it cannot handle with certainty. You click the appropriate button and it continues. The degree to which it makes these decisions depends on which percentage you choose:
- 80% Auto handles many of the decisions itself, while
- 99% Auto handles most of them, including proper removal of rewrap markers immediately preceding and following page separators, and separators that are immediately followed by another separator.
- Auto Advance performs the action you select and advances to the next separator.
- No Auto performs the action you select but does not advance the cursor, so to find the next separator, you must click Refresh each time.
- 99% Auto usually gives the most accurate results. 80% Auto may not necessarily remove rewrap markers immediately preceding and following separators, so you will have to find and delete them yourself.
- The tool allows you to decide how to rejoin words that proofers have marked with -* when they are split across page boundaries. However the tool cannot help with words split between lines in the middle of a page. During proofreading, such words may be rejoined and the hyphen retained or silently removed. More usually, the hyphen is marked with an asterisk (as above) to let the post-processor make the decision on how to rejoin the word.
- The tool does not attempt to rejoin Index entries whose page lists begin on one page and continue onto the next page.. The Formatters are advised to handle that situation by beginning the continuation page with an opening no-wrap /* and to left-justify the rest of the page list on the next line, without leaving a blank line (which would indicate a new main entry) and without indenting it (which would indicate a new sub-entry). You will need to look for these and rejoin the two lines manually, either before or just after removing the page separators. For related information about this, refer to Multi-Page Blocks below.
- The following regular expression can be used with Search & Replace to help you find and remove any unwanted hyphens in words at the end of lines:
Search: (\s)([A-Za-z]+?)-([a-z,;:\.!\?]+?)\n Replace: $1$2$3\n
- When proofers have marked split words in the middle of a page with -*, a simple (non-regex) Search & Replace can find and help resolve these:
Search: -* Replace: (empty/null line to just remove the -*) Replace: - (hyphen to make the word hyphenated)
- Join, Keep Hyphen tends to leave a trailing space at the end of the line, so use Remove end-of-line Spaces after finishing with the page separators.
Keyboard Shortcuts and Buttons
The buttons do what their captions indicate; most of them remove the separator lines as well; the list of keyboard shortcuts, below, corresponds to those buttons. You mostly will want to use the keyboard shortcuts to speed the process; they are available while the Page Separators dialog has the keyboard focus. The Help button displays a popup summary of these functions, while F1 invokes Guiguts' context-sensitive Help, and will display what you're reading right now.
Note: These keyboard shortcuts only work in lower-case, even though the dialog shows capital letters underlined.
|j||Join Lines - join lines, remove all blank lines, spaces, asterisks and hyphens; delete separator|
|k||Join, Keep Hyphen - join lines, remove all blank lines, spaces and asterisks, keep hyphen; delete separator|
|l||Blank Line - leave one blank line. Close up any other whitespace (paragraph break); delete separator|
|t||New Section - leave two blank lines. Close up any other whitespace (section break); delete separator|
|h||New Chapter - leave four blank lines. Close up any other whitespace (chapter break); delete separator|
|r||Refresh - search for, highlight and re-center the next page separator; remove separator(s) if in any "Auto" mode|
|d||Delete - delete the page separator. Make no other edits|
|u||Undo - undo the last edit|
|e||Redo - redo the last undo|
|v||View the current page in the image viewer|
|a||Cycle Automatic modes|
Remove End-of-page Blank Lines
This removes blank lines immediately above page separators. Once the separators have been removed, this does nothing.
Remove End-of-line Spaces
This removes trailing spaces from all lines; you will want to use it frequently, including when working with tables, checking for long lines, and at the very end of preparing the Plain Text version of the book.
Most of the rewrap options on this menu, including "Clean Up Rewrap Markers", are used only when preparing the Plain Text version of the book. (Using the "Rewrap Selection" option when preparing the HTML version may make it easier for you, the post-processor, to read what's in the Guiguts window, but it won't affect the appearance of the published ebook.) Unlike the rewrap options on this menu, Guiguts uses most of the rewrap markers (below) for both Plain Text and HTML. So, by refining these markers while working on the common file (before splitting off separate copies that will become the final Plain Text and HTML versions), you can save time and increase the likelihood that both versions will be presented in the same way.
The DP guidelines specify only two type of rewrap markers: /# ... #/ for rewrappable text (block quotes), and /* ... */ for anything else requiring special handling during post-processing (e.g., poetry, tables, and lists). Guiguts, however, supports several additional rewrap markers which have varying effects on the rewrap rules for Plain Text and for the generated HTML (see Prepare the text for conversion to HTML). The markers using letters, e.g., /p, may be upper- or lower-case. Before rewrapping Plain Text, all inline tags should have been converted to their final Plain Text form, e.g., <i> should have been changed to an underscore:
|/#...#/||Yes||Rewraps within default or specified margins. (See below).||As block quote.|
|/*...*/||No rewrap.||Defaults to no-wrap indentation specified on the Preferences Menu.||Preserves alignment and line breaks.|
|/$...$/||No rewrap.||No indent.||Left-justifies all lines, preserves line breaks.|
|/P...P/||No rewrap.||Uses Poetry indentation specified on the Preferences Menu.||As poetry.|
|/C...C/||No rewrap. (See below).||Centers each line within the block, but does not rejoin/rewrap them.||Assigns a <div class="center"> to the block, and adds a <br /> at the end of each line to prevent rewrapping.|
|/R...R/||No rewrap. (See below).||Slides the block to the right, until the longest line in the block is at the right margin. Maintains relative indentations of the other lines. If the /R block is within a Block Quote /#...#/, the right margin of the containing block will be used.||Assigns a <div class="right"> to the block, adds a <br /> at the end of each line to prevent rewrapping, and attempts to maintain relative indentation of the lines within the /R block by using <span style="margin-right"> (with appropriate numeric values) for all but the longest line.|
|/F...F/||Limited rewrap in HTML.||Ignored in Plain Text.||Centered paragraphs ('f' stands for 'Front Matter').|
Each set of non-blank lines within the block becomes a centered, wrappable paragraph: <p class="center"> ... </p>.
|/L...L/||No rewrap.||Fixed.||Unsigned list.|
|/X...X/||No rewrap.||No indent.||Generates <pre>...</pre>.|
|/i...i/||Yes, within each entry and each sub-entry.||Indents and rewraps Text Index with hanging indents. (See below).||Generates a formatted, linked Index (see HTML>HTML Auto Index).|
Note: The opening and closing markers (/*, etc.) should stand alone on a line, but an opening markup must be preceded by a blank line, another opening rewrap marker, or a page separator (if you haven't already removed them), and a closing markup must be followed by a blank line, another closing rewrap marker, or a page separator (etc.). If the blank lines, permitted rewrap markers or page separators are missing, Guiguts will fail to recognize either the beginning or the end of the markup and mis-wrapping will result.
Note: All rewrap markers may be used within Block Quotes /# ... #/, but some, such as /C and /R, may not be used within other rewrap markers. Attempting to do so usually will yield mis-wrapped results.
A Regex to find all blocks and selectively change some of them to other types
Note: Although this searches for all of the above markers, it's primarily intended to validate the Block Quote and No Wrap markers added in the Formatting Rounds, and to change some of them to the extra ones recognized by Guiguts. Use it while checking and correcting the common file, before making separate copies for Plain Text and HTML:
Search: \n/([\*#$xXfFlLpPiIcCrR])((.|\n)*?)\n\1/ Replace for poetry: \n/P$2\nP/ Replace for centering: \n/C$2\nC/ Replace for right-alignment maintaining relative indentations: \n/R$2\nR/
Multi-Page Rewrap Blocks
Long block quotes, tables, lists, Indexes, and poems that cross pages will normally have been formatted with opening (and closing) markers at the start (and end) of each continuation page. When those continuation-page markers are on the very first line of the page, immediately following the page separators, Guiguts understands that the block on the second page is just a continuation of what was on the preceding page, and removes the closing marker, the separator, and the opening marker. If the opening marker is followed by a blank line, Guiguts will preserve it, assuming it indicates a new paragraph, a new stanza, or a new row in a table or list.
However, if the first line of a continuation page is blank and the second line is an opening marker, Guiguts will remove only the page separator and will preserve the closing marker, the blank line, and the opening marker. That result most likely is wrong, because books hardly ever print two different quotes, poems, or tables without some regular text, such as an author's comment or a table heading, between them.
You can either prevent this kind of error by looking for incorrectly-placed opening markers before rewrap, or fix these errors by looking for closing markers (blank line) opening markers after rewrap. In either case, make sure blank lines remain where, and only where they should occur.
If you look for these marker sequences before rewrap, you can either:
- remove the end-of-page closing marker and the following top-of-page opening marker, while keeping or deleting a blank line, depending on whether or not it should be there, or
- put the opening marker on the very first line and following it with a blank line if it denotes a new paragraph, stanza, or line.
In the unlikely event that two blocks of the same type actually do follow each other with nothing in between, leave a blank line at the top of the second page and the opening marker directly below it.
You can rewrap part of a document by selecting text and using the command "Tools>Rewrap Selection". Guiguts rewraps the selected text, adjusting unmarked text to the default margins and adjusting marked blocks according to the type of markup. Rewrap All doesn't need preselected text, as it applies to the entire document.
Rewrap operations are not compatible with Undo/Redo, so for safety, save the file first. If you rewrap to a wrong margin, you may be able to re-select the same text and rewrap it again to the correct margin.
By default, a Plain Text table is indented the amount specified by the value set for NoWrap Blocks (/*...*/) in the Preferences>Processing>Set Rewrap Margins dialog. However, you can set a specific indent for any table (/*..*/ markup) by placing an indent value, 0 or a positive integer, in brackets immediately after the opening /*. For example, this text:
/* Some Tabular Text */
will be changed as follows by the rewrapping operation:
/* Some Tabular Text */
Block Quote Indent and Margins
This applies only to Plain Text, and the extra parameters described here should be added only to the Plain Text version. For HTML, the /#...#/ marker generates <div class="blockquot"> or <blockquote>.
By default, a block quote is rewrapped according to the margins specified in the Preferences>Processing>Set rewrap margins dialog. But you can set a specific left margin, hanging indent, and right margin for any individual block quote (/#..#/ markup) by putting up to three numbers in brackets after the opening /#. In computerese, the syntax is: left[. first][, right], where left is the number of spaces (indentation) on the left side, first is the number of spaces (indentation) for the first line of the paragraph, and right is the line length; that is, the maximum number of characters per line you want, including any left-side indentation. Here are different combinations of those:
|/#[ left ]||Wrap to left margin left, default right margin|
|/#[ left , right ]||Wrap within margins left and right|
|/#[ left . first ]||Wrap first line of each paragraph to margin first, remaining lines to left margin left, default right margin|
|/#[ left . first , right ]||Wrap first line of each paragraph in margins first to right, other lines in margins left and right|
Examples of Indenting Block Quotes
For example, this quote:
/#[4.8,24] I hope to find you well and expect to arrive Wednesday <i>inst</i> Eugenie asks to be remembered to all with love. #/
will rewrap as follows (the top line is a ruler, not included in what actually happens):
....,....1....,....2....,....3 /#[4.8,24] I hope to find you well and expect to arrive Wednesday <i>inst</i> Eugenie asks to be remembered to all with love. #/
A hanging indent may be done the same way if the first line is indented less than the others. For example this quote:
/#[8.4,24] I hope to find you well and expect to arrive Wednesday <i>inst</i> Eugenie asks to be remembered to all with love. #/
will rewrap as follows:
/#[8.4,24] I hope to find you well and expect to arrive Wednesday <i>inst</i> Eugenie asks to be remembered to all with love. #/
You can change the margins midway through a block quote simply by closing it and starting a new block quote with different margin numbers.
When to Use (and not use) the /C Centering Marker
/C is intended for use with normal body text that should be centered but not rewrapped. Multi-line epitaphs, one-line aphorisms, and the title of a letter are examples of this; headings generally are wrappable and should be preceded by multiple blank lines, not enclosed in a /C block. When the /C block is within a Block Quote /#, centering is done within the margins of the Block Quote.
/C markers may be used within block quotes but not within any other markers.
When to Use (and not use) the /R Right-Align Marker
/R is primarily used with correspondence, as it facilitates positioning the lines in the city/date area at the top of the letter and the lines in the signature area at the bottom of the letter near the right margin. You (or people in the Formatting rounds) can indent the lines of each area to match their appearance in the original book, and Guiguts will attempt to preserve the indentations. /R also may be useful in positioning one-line credits just below illustrations, as it will place them at the right margin of the illustration's <div>. /R does not right-justify all of the lines within the block; it tries to move all of the lines the same distance towards the right, until one of them reaches the right margin. When the /R block is within a Block Quote /#, which will happen frequently, the indented right-margin of the Block Quote becomes the right-margin of the /R block.
/R markers may be used within block quotes but not within any other markers.
Index Indent and Margins
This applies only to Plain Text, and the extra parameters described here should be added only to the Plain Text version.
The default settings for rewrapping a Plain Text Index are:
These values are used a little differently than in a Block Quote, because an Index may have multiple levels of entries and sub-entries:
- the first value specifies the rewrap column that will be used by all levels: any entry, at any level, whose length would cause it go past the right margin specified by the third value will be rewrapped to the character column specified by this first value;
- the second value specifies the left margin for main entries. Each level of sub-entry will be indented two spaces further than the level above it; that additional indentation is not related to the '2' shown in the default example above, but is fixed;
- the third value specifies the right margin for the Index. It can be greater or less than 72, and any entry whose length would cause it to go past that right margin will be rewrapped to the character column specified by the first value.
Examples of Indenting an Index
Using the defaults,
/I A Accra, Africa, 50 Adamski, G., 16, 203, 204, 278 Aerial Phenomena Group, U. S. Air Force, 2, 271, 272. _See also_ ATIC Aerial Phenomena Research Organization (APRO), 181, 219, 235–36, 275, 278 Aerospace Technical Intelligence Center (ATIC), 2, 271. _See also_ ATIC
will rewrap to:
/I A Accra, Africa, 50 Adamski, G., 16, 203, 204, 278 Aerial Phenomena Group, U. S. Air Force, 2, 271, 272. _See also_ ATIC Aerial Phenomena Research Organization (APRO), 181, 219, 235–36, 275, 278 Aerospace Technical Intelligence Center (ATIC), 2, 271. _See also_ ATIC
/I[14,4,80] Angel hair, 220–26; alleged origins of, 194, 221; arachnid, 220–24; industrial, 224 “Angels” on radar, Pl. IVc; collision course of, 153–54; defined, 151; conditions producing, 157–60, 164, 170; moisture inversion and, 151, 158–60; possible causes of, 157–58; ring, 150, 165–66; temperature inversion and, 151–52, 158–60; UFO reports based on, 5–6, 71, 72, 151–52, 155–57, 161, 164–71, 182, 190, 192, 200, 202, 204, 208, 220, 222, 230, 232, 240, 242, 250, 252, 254, 256, 258, 260, 272, 384, 292, 300 Ann Arbor, Mich., 241
(the line beginning "UFO reports" is one very long line, but your Browser will rewrap it if it's too wide for your screen) will rewrap to:
/I[14.4.80] Angel hair, 220–26; alleged origins of, 194, 221; arachnid, 220–24; industrial, 224 “Angels” on radar, Pl. IVc; collision course of, 153–54; defined, 151; conditions producing, 157–60, 164, 170; moisture inversion and, 151, 158–60; possible causes of, 157–58; ring, 150, 165–66; temperature inversion and, 151–52, 158–60; UFO reports based on, 5–6, 71, 72, 151–52, 155–57, 161, 164–71, 182, 190, 192, 200, 202, 204, 208, 220, 222, 230, 232, 240, 242, 250, 252, 254, 256, 258, 260, 272, 384, 292, 300 Ann Arbor, Mich., 241
Rewraps the entire document, using the set margins values in the Preferences menu.
Rewraps the selected text (usually one or more complete paragraphs), using the set margins values in the Preferences menu. Caution: If the selected text is not followed by a blank line (for example you are rewrapping only part of a paragraph), a blank line will be added to the rewrapped text.
Block Rewrap Selection
Rewraps the selected text (usually one or more complete paragraphs), using the set margins values for blocks in the Preferences menu. Caution: If the selected text is not followed by a blank line (for example you are rewrapping only part of a paragraph), a blank line will be added to the rewrapped text.
You can use this to try to stop rewrapping, but still will have to re-open the last saved copy of the document, as UNDO isn't likely to put things back the way they were.
Clean Up Rewrap Markers
Removes all of the rewrap markers from the document. It does not remove extra blank lines that may have been added to keep adjacent close/open markup separated from each other.
Inspection and manual removal of extra blank lines may be necessary. This is because it isn't possible for Guiguts to know whether you wanted two separate tables/poems, or whether it was intended to be all one table/poem. So, Clean Up Rewrap Markers has to leave the blank line there.
However, if the guidance in the Note at the start of the Rewrap section is followed, there should be no unwanted blank lines to remove at this stage.
Once Clean Up Rewrap Markers has been run, and the result saved, it'll be more difficult to do some kinds of searches and automatic rewrapping, so this usually is one of the very last steps in preparing the Plain Text version of the book.