GutCutter

From DPWiki
Jump to navigation Jump to search

GutCutter is a software tool that aids in Post-processing. Its main function is to help prepare HTML versions of texts, but it also has an error-highlighting function. It is part of the Windows-only GutWrench suite, but can be used in combination with other tools as well.

Using GutCutter

GutCutter takes as input a "well-marked-up version" of a text, which is a text that has been readied for PPing at DP, with the addition of some HTML tags and a few other special markups. GutCutter can prepare both the HTML version and the plain-text version from this single file.

GutCutter recognizes the standard DP markups:

as well as the standard HTML markups:

  • <h1> ... </h1>, etc., for headers
  • <i> ... </i>, for italic text
  • <b> ... </b>, for bold text
  • <u> ... </u>, for underlined text
  • <center> ... </center>, for centered text

When the plain-text version is produced, some markups are deleted, and some are converted to standard HTML markups; the result is a standard DP-style marked-up text. For example, HTML header markups are temporarily converted to bold markups (<b> ... </b>), in which the enclosed text may be later converted to ALL CAPS by GutHammer, the text-rewrapping tool of the GutWrench suite.

In addition, depending on the style used, some of the above may be modified by the user, such as:

  • <tb long>, <tb short>, <tb full> to denote that long, short, and full-page-width horizontal rules should be used in the HTML version (all are converted to the standard 5 asterisks for the plain-text version).
  • <h1sc>Header Text in Title Case</h1>, etc., to denote headers set in small caps.

GutCutter includes a number of different templates for producing HTML documents in various styles. Templates have been prepared to yield the prescribed styles for the following periodicals and uberprojects:

Special how-to files are included for each of the above periodicals in GutCutter's documentation. For a demonstration of GutCutter's capabilities, run it on FMdemo.txt (included in the download package) using "The Full Monty" style.

How to run GutCutter

Running GutCutter involves three major steps: 1) prepare a well-marked-up version by adding some more markup; 2) prepare the HTML version; and 3) prepare the plain-text version.

Prepare well-marked-up version

First, prepare your "well-marked-up" version by adding some more special markups to your downloaded text.

  1. Add <h1> ... </h1> markup around the book title. (The GutJack tool of the GutWrench suite can aid in quickly adding these and other HTML header markups.)
  2. Add <h2> ... </h2> markup around the chapter titles, and <h3> ... </h3> markup around the sub-chapter titles, etc.
  3. Add <center> ... </center> markup around text that is centered horizontally.
  4. If needed, modify each <tb> to indicate the length of the horizontal rule that will be used in the HTML version to indicate each thought break: <tb long>, <tb short>, or <tb full>. You may need to insert some thought breaks that were not apparent to the formatters, e.g., after magazine articles that end at the bottom of a page.
  5. Optionally, modify DP-style [Illustration ...] tags to indicate the justification, scale, and image filename, as shown below.
  6. Patch up any words that are hyphenated across page separators (or italics or bold markup that is interrupted by a page separator). For hyphenated words, put the complete word either before or after the page separator.
  7. Set "oe" and "OE" ligatures as "[oe]" and "[Oe]". Other special markups are available for other non-Latin-1 characters; see the documentation for details.
  8. Renumber the numbers on the page separators to match the page numbers of the associated text. (GutPunch, another tool in the GutWrench suite, is helpful in doing this.)
  9. Add special markups for Tables, as described below.

Prepare HTML version

Second, prepare the HTML version:

  1. Open the GutCutter application
  2. Select the text file (text files in the same folder as the GutCutter application are shown in the Select Input Text File menu; to selected a file in another folder, use the menu bar: File -> Open...)
  3. Select the desired Style template
  4. Select Output as "HTML"
  5. Click "Modify text"

(Unless you are doing something special, leave the Add Header and Add Trailer checkboxes as checked.)

GutCutter interface, after converting D.txt into an HTML document in the Punch style

In general, this modifies the text and adds special header and trailer text, depending on the selected Style. Open the output file, GCout.htm, in your favorite browser. If you spot any errors, it's usually easier to change them in the original text, and then rerun GutCutter. (Errors are more apparent in the HTML version; that's why it's best to prepare that first.) However, be aware that GutCutter is not able to perform every step in preparing an HTML version; ordinarily, some direct editing of the HTML is needed, but put this off until you have made the plain-text version, since you might spot more errors when you inspect that.

Prepare plain-text version

Once the HTML version looks okay, go on to prepare the plain-text version:

  1. Open the GutCutter application
  2. Select the text file, as described above
  3. Select the desired style template
  4. Select Output as "Plain text"
  5. Click the "Modify text" button

Open the output file, GCout.txt (note different suffix), in your favorite text editor; this should now be a standard DP-style marked-up text. If on inspection you find any errors here, it might be easier to correct them in your original well-marked-up version, then regenerate both the HTML and plain-text versions from that. If the project is a periodical, the header added by GutCutter will needed to be edited as to date, volume, issue, etc. To complete the plain-text version, run it through any of the several DP rewrapping tools, which will also properly treat any remaining DP-style markup.

Modes menu

Selections under the Modes menu in the menu bar modify how GutGutter operates:

  • Interactive mode asks for confirmation of most changes that GutCutter makes (not recommended for normal use; useful mostly for debugging the preparation of new style templates)
  • Silent mode gives fewer beeps

Modified Illustration tags

Optionally, you can add some simple extra markups to standard PG-style "[Illustration: ...]" tags to automatically scale and locate images in the HTML version. Any caption text following the ":" is automatically set as a caption under the figure. For example:

  • "[Illustration R30/4-2:" scales the image, named "4-2.png", by 30% and floats it right (right-justified).
  • "[Illustration 60/4-2:" scales the image, named "4-2.png", by 60% and centers it (centering is the default).
  • "[Illustration C/4-2:" or "[Illustration:" leaves the image full-sized and centers it (full size is the default for centered images).
  • "[Illustration L/4-2:" or "[Illustration R/4-2:" scales the image by 50% and left- or right-justifies it (50% is the default scale factor for justified images).

Any whole number may be used for the scaling percentage. All of these added markups are returned to "[Illustration:" in the plain-text output.

Special table markups

GutCutter can facilitate setting up HTML tables; this has been implemented only in certain styles, including Punch, Notes & Queries, and Lippincott's. You will have to type in only a few more characters, which GutCutter will convert to generally much more lengthy HTML markup. These characters and other HTML markups are removed by GutCutter in making the plain-text version.

Use the standard DP-style /$ ... $/ tags around each table. The opening /$ may be followed by the table summary.

Optionally, put the table caption in <caption> ... </caption> tags, on its own line after the /$.

Set each table row as a single line of text. Separate the table columns with vertical-line ("pipe") characters "|". Also, put "|" before the first column and after the last.

Optionally, indicate the alignment of each table element by adding a special character immediately after the preceding "|":

  • |< for left-justified
  • |> for right-justified
  • |= for horizontally centered
  • |^ for vertically aligned at the top
  • |_ for vertically aligned at the bottom

The symbols <, >, and = can also be used immediately after the opening /$ markup to indicate the horizontal alignment of the entire table; e.g., "/$=summary" for a centered table.

An example marked-up table using this system:

/$=Demo

Batters' Statistics. |<Player's name. |= Hits.|= At bats.|= Avg.| |<Joe Schmoe |> 4 |> 5 |^ .800| |<Ed Glover |> 3 |> 6 |^ .500| |<Cornelius Kolb |> 2 |> 7 |_ .283| $/ which GutCutter will convert in its HTML output to: <table summary="Demo" align="center"> <caption align="top">Batters' Statistics.</caption> <tr><td align="left">Player's name.</td><td align="center">Hits.</td><td align="center">At bats.</td><td align="center">Avg.</td></tr> <tr><td align="left">Joe Schmoe</td><td align="right">4</td><td align="right">5</td><td valign="top">.800</td></tr> <tr><td align="left">Ed Glover</td><td align="right">3</td><td align="right">6</td><td valign="top">.500</td></tr> <tr><td align="left">Cornelius Kolb</td><td align="right">2</td><td align="right">7</td><td valign="bottom">.283</td></tr> </table> which will appear in an HTML browser as:

Batters' Statistics.
Player's name.Hits.At bats.Avg.
Joe Schmoe45.800
Ed Glover36.500
Cornelius Kolb27.283

Some Style templates are set up to use CSS classes to align table elements in the HTML.

Note that in preparing the plain-text version, the entire <table ...> and <caption ...> tags are removed, including any optional parameters. The <table...> and </table> tags are replaced with /$ ... $/ markups (to be removed later by GutHammer or other PP tool). So the above example will appear in GutCutter's plain-text output as:

/$
Batters' Statistics.
|Player's name. | Hits.| At bats.| Avg.|
|Joe Schmoe     |   4  |    5    | .800|
|Ed Glover      |   3  |    6    | .500|
|Cornelius Kolb |   2  |    7    | .283|
$/

The /$ ... $/ markup will be removed (and handled properly) by any of the text rewrapping tools, GutHammer, RewrapIndent, and guiguts.

Help for Tables of Contents

GutCutter can facilitate adding a Table of Contents to a text that did not have one. In preparing the "well-marked-up version", use HTML header tags around at least every heading that you would like to appear in the Table of Contents. Then when "Modify Text" is clicked, any lines of text that are contained in the header tags are written to a file named GCtoc.txt. Insert the contents of this file into the output file you have just created. Some manual editing will generally be required (in particular, page numbers must be added, and you might have to delete some lines), but this can save a lot of hunting through the text and typing. The file GCtoc.txt will have different contents depending on whether GutCutter has just generated a plain-text or HTML version.

The "Full Monty" style

As a demonstration, a special style has been set up to incorporate all of the above markups plus a few additional gimmicks. While its safe use cannot be guaranteed for every project, some GutCutter users apply it routinely to all of their projects, with no complaints so far. The additional markups include:

  • /d ... d/ around drama that is not set as verse. Separate each speaker's bit of dialogue by a blank line. In each speech, the first line of each speech will remain unindented, while all subsequent lines will be indented two spaces. (For drama set as verse, use /* ... */ tags around it, and format each line as you would like it to appear.)
  • ## before every reference to a page number that you would like to become a link to that page. For example, "See p. ##123" and "See p. ##123-134" (which will link to the first page of the range). Also useful for Indices.

To see a demonstration of these features, run GutCutter on the file FMDemo.txt to produce an HTML version using the Full Monty style.

Tips for best results

GutCutter operates on only a single line of text at a time. Thus, Illustration captions and Footnotes that run on for several lines will be only incompletely handled. If it is convenient, try to edit these beforehand to fit on a single line, in which case GutCutter will be able to render the complete HTML markup.

Using GutCutter to highlight errors

GutCutter-created HTML document highlighting suspected errors

One of GutCutter's Styles is set up to prepare an HTML document that highlights many types of suspected errors in a text file, including:

To use this function:

  1. Open the GutCutter application
  2. Select your input text file
  3. Select the Style named "Show Errors (HTML only)".
  4. Click the "Modify Text" button. Be patient—this may take several minutes, as GutCutter is checking for over a thousand kinds of errors, as well as formatting the text into HTML according to whatever markup tags the text file has in place.
  5. Close GutCutter
  6. Open the output file GCout.htm in an HTML browser, such as Internet Explorer, Netscape Navigator, or Firefox. Hovering the cursor over any highlighted bit of text will usually (Firefox being an exception) indicate the reason why it is suspedted of being an error (e.g., "be for he", "rn/m confusion", "Stealth scanno").

The errors highlighted are those that would be found by GutAxe, another of the GutWrench suite of tools. The advantage of GutCutter is that the highlighted HTML document can be visually scanned very quickly for actual errors. Thus GutCutter is an optional replacement for GutAxe, the use of which has become somewhat tedious with the great increase in text quality since the changeover to four rounds of proofreading and formatting. Like GutAxe, this use of GutCutter is useful in Post-Processing Verification.

Note: The HTML file generated by this error-highlighting function is to be used for error-checking only. Do not submit it to DP for Post-Processing Verification or upload it to Project Gutenberg.