Guiguts PP Process Checklist

From DPWiki
Jump to navigation Jump to search

This checklist is made for use with the latest version of Guiguts and refers to Guiguts commands and menu items. It contains steps for all types of books including those with illustrations, poetry, footnotes, sidenotes and indexes. The checklist will help you create both a Plain Text and an HTML version. Not all projects need all steps. As you gain experience you may decide to do things in a different sequence.

An alternative approach to that below is to create a single source file in ppgen format.

When editing this page, please use links to other wiki topics rather than repeating things covered elsewhere.

Process the Pages

Our first set of activities will combine all proofed and formatted pages, fixing any errors and inconsistencies.

Initial Setup (1 hr.)

  • Go to Project page
    • Read details and requirements.
    • Bookmark the project URL.
    • Read the project forum page, note any issues proofers raised.
  • Make a project folder, e.g. (Win) C:\dp\pp\bookname or (Mac/Linux) /dp/pp/bookname
  • Download the text and images files and unpack in new folder:
    • Text to bookname.txt.
    • Page images (nnn.png) in subfolder pngs.
    • Hi-res illustration scans (imagenn.png) in subfolder originals.
    • Empty subfolder images.
  • Use File>Open to open bookname.txt.
  • Use File>Project>Configure Page Labels. This allows the page numbers in bookname.txt to match the page images.

Sequential Inspection of Text (4-20 hr.)

This is the only step in which you will examine the whole text in sequence; hereafter you navigate with searches. Some post-proofers still read the book carefully, although this is not as crucial as it used to be under the old two-round system. Others skim the text comparing it to the page images and double-checking format.

Either way, be sure to turn on automatic scanno highlighting before starting during this pass.

Check for:

  • Proper markup of <i>italic</i> and <b>bold</b>.
Watch for punctuation wrongly contained in markups, such as <i>(ibid.</i> or <b>Subtopic.</b>.
  • Proper markup of Greek and other transliterations (content check later)
  • Block material all marked in some fashion:
    • poetry, misc. tabular in /* */
    • block quotes in /# #/
    • Fix block markups that cross page boundaries now or in the next step
  • Figures properly in [Illustration: caption]
    • check: caption text agrees with List of Illustrations (if any)
    • consistent spelling, abbreviation, capitalization in captions
  • Fix Footnotes, Illustrations still inside a paragraph.
    • move outside paragraph to next or prior page as appropriate
    • don't worry about duplicate footnote number/symbol now
    • sidenotes handled later
  • Make notes of things that will need attention in the HTML:
    • Author cross-references like "(p. 150)" and "see page 222" that should become links.
    • How the editor laid out special sections such as tables and sidebars.

Basic Fixup (10 min.)

  • Use Tools>Basic Fixup with all options checked. (GG manual says use it judiciously. For example, you may want to turn off "Fix up spaces around hyphens" and "Format ellipses correctly." As with everything else on the Tools menu, be sure to save a good copy of your document before using this option.)
  • Use Tools>Remove End-of-line Spaces.
  • Remove any instances of the [Blank Page] tag appearing on pages with no text or images.

Fix Block Markups and Proofer Notes (15-60 min.)

  • Use the Search menu to step through all /* */ blocks.
    • check for a blank line before and after markup
    • make sure correct Rewrap Markers used
    • close-up where broken at page boundaries
    • apply specific indent value if desired
    • convert poetry from /*..*/ to /P..P/
    • make sure poetry line numbers are at least two spaces to the right of the line.
  • Use the Search menu to step through all /#..#/ blocks.
    • check for a blank line before and after markup
    • make sure correct Rewrap Markers used
    • close-up where broken at page boundaries
    • check consistent indentation of block text
    • apply specific margin values if desired
  • Use Search>Find Next Proofer Comment. Resolve all proofer's notes.
  • Use Search>Find Orphaned DP Markup.
  • Use Tools>Check Orphaned Brackets to check each type of bracket and markup. Do not omit the lowly parenthesis, often mis-scanned as curly-brace.
  • Look for malformed thought-breaks (5 stars).

Format Front Matter (15 min.)

  • Format the title page, preserving as much of the original material as possible. Protect in /X...X/ (no rewrap, no indent) or /F...F/ (the same, except that it will be centered in the html version).
  • Edit the TOC. Find each matching chapter head; make sure heads are 1:1 with TOC. Protect TOC with /X...X/. Note that your TOC will probably need to be indented to prevent rewrapping, particularly if you use multiple spaces to align page numbers.
  • If book has illustrations, edit or create List of Illustrations (Note: this is not a requirement). Make sure it is 1:1 with [Illustration] captions. Protect with /X...X/.

Edit Transliterations (0-? hr.)

Remove Visible Page Breaks (10-30 min.)

Apply Word-Frequency Checks (10-60 min.)

Open Tools>Word Frequency. Double click on a word to search for it.

  • Set the Frq switch; click All Words. List is now sorted by word frequency; scroll to the end and skim up the list of words that only appear 1 time looking for oddities and obvious misspellings.
  • Click Character Cnts.
    • Note characters that appear only once, check usage.
    • Check for equal counts of left & right parens and brackets.
  • Set the Alph switch; click All Words. Scroll to the word Footnote and write down count for later use. (If the count is large, click once on Footnote and click 1st Harm. The harmonic window shows you any of the common misspellings of "Footnote" that occur.)
  • Click Emdashes. This shows words with emdashes in them as well as similar words without emdashes (aka: suspects) marked with ****. Check suspects against the text and page images. Preserve author's intent even when inconsistent. Hint: Enable the Suspects flag and click Emdashes again to see only suspects words.
  • Click Hyphens. Same as Emdashes above but for Hyphens.
  • Click Alpha/num. Scan list for one/ell and oh/zero errors.
  • Click ALL CAPS. Scan list looking for oddities.
  • Click MiXeD CasE. Scan list looking for letters such as o that sometimes OCR wrongly as uppercase. Oh/zero errors can show up here, too.
  • Click Check Accents. Scan list looking for mistakes, inconsistent usages.
  • Click Check , Upper. Scan list for comma-for-period errors.
  • Click Check . Lower. Scan list for period-for-comma errors.
  • Click Ital/Bold/SC. Scan list for incorrect or inconsistent use of italics, bold face, and small caps.
  • Click Ligatures. Scan list for incorrect or inconsistent use of ae and oe ligatures.

Apply Scanno Checks (1-3 hr.)

  • Use Tools>Stealth Scannos.
    • Start scanno searching based on en-commn.rc. Work through the list.
    • Apply scanno searching based on misspelled.rc. Work through the list.
    • Apply scanno searching based on regex.rc. Work through the list.
  • If you have installed Jeebies, use Tools>Run Jeebies. Examine its report of possible he/be errors.

Apply Bookloupe (10-45 min.)

Apply Spellcheck (30-90 min.)

Use Tools>Spell Check. Proceed through the document, correcting words or adding them to the project dictionary as appropriate.

Fix Sidenotes (0-? hr.)

Read the discussion. Step through sidenotes with: Search&Replace of [S, not regex, not whole word, ignore case. Click Search to find each Sidenote.

  • Compare to page image. Move note above paragraph if feasible.
  • Otherwise, position it above the sentence to which it applies, with blank lines to prevent rewrapping if you decide that is best.

Fix Footnotes (0-? hr.)

Use Tools>Footnote Fixup. This will help you validate and move any footnotes.

Fix Poetry Line Numbers (0-20 min.)

If the book has poetry that uses line numbers, read this page and align the line numbers consistently.

Check balanced markup

Use the Search menu

  • Find Orphaned Markup (which searches for the regular expression \<(\w+)>\n?[^<]+<(?!/\1>), that is any markup starting in <..> that doesn't end in an identical closing markup.
Note: this regular expression sees <tb> as unbalanced, and shows the text from the <tb> to the next markup as an error. (If you can devise a better regex please do!)
Possible alternate that explicitly lists all current markup \<(i|b|sc|g||f|u)>\n?[^<]+<(?!/\1>)
Because it includes a newline, the search may take several seconds to return the first result.
  • Correct the error and click search until no more are found.

Consider page numbers and curly quotes

  • Consider whether you want visible page numbers in the text versions as well as the html version. This is not generally advisable, nor is it generally welcomed by PPV/WW, but there are exceptions. If you do want them in both versions, then now is the time to put them there, before splitting the files in in the next step.
  • Search for remaining upright single quotation marks and replace them with either a ‘ or a ’.

Save Edited Markup (2 min.)

  • Save any unsaved changes in bookname.txt.
  • Use File>Save a Copy As to make bookname.html
This will be the starting file for the HTML version. You can also use it as fallback in case you mess up and need to start the following steps over.

Prepare the Plain Text Version

We now proceed to create a Plain Text Version of the book.

  • Re-open bookname.txt (if not still open).

Convert <tb>, Italic, Bold, and Smallcap (10 min.)

Fix ASCII Tables (0-? hr.)

  • Use Search>Find Next /**/ Block to step through all tabular material.
    • Compare to page image; reformat to best convey author intent.
    • For complex tables, use Txt>ASCII Table Effects to reformat.

Rewrap and Clear Rewrap Markers (10-30 min.)

  • Save the file if any unsaved changes.
  • Use Tools>Rewrap All. Wait while rewrap completes.
  • Page through entire text, looking for improper indentation. If found, re-open, clicking NO when asked if you want to save the edits. Find and fix broken rewrap markups. Repeat this step.
  • Under Tools>Footnote Fixup, use the Tidy Up Footnotes button.
  • Use Tools>Clean Up Rewrap Markers.
  • Use Tools>Remove End-of-line Spaces.
  • Rerun Bookloupe or pptext. Resolve any new issues.
  • Save the document.

Prepare the HTML Version

Finally, we create an HTML Version of the book.

Generate the HTML (4-? hr.)

  • Open bookname.html that was saved previously.
  • It is preferable for the source line-breaks to match the book; however HTML poetry markup won't work unless /P..P/ sections have been rewrapped. If the book has much poetry, rewrap it all; else select and rewrap poetry sections individually.
  • Don't remove the rewrap markers. These are needed for generation of proper HTML.
  • Open HTML>HTML Generator.
    • Set optional switches as desired.
    • Use the Autogenerate HTML button.
  • Save the file and open it in a browser by using the View in Browser button.
  • Scroll through looking for systematic errors. (Title pages, tables, etc. will look terrible; no matter). If automatic conversion messed up, delete the file and start this step over with the backup file.
  • Page through the book looking for text that was not handled well by automatic HTML generation, in particular:
  • Use HTML>HTML Markup to make improvements. Use regex replacements to make systematic changes.
  • Where you see a problem, make a correction in Guiguts, save the file, and click the "reload" button in the web browser.
  • Hyperlink page references in text, TOC, and index (discussed here and here).
  • Remove the Generated TOC if it is not needed.
  • Add "abbr" and "lang" tags as appropriate.
  • Decide what (if anything) you plan to do about non-Latin-1 characters in the document. You can use HTML escapes to insert any Unicode character. One high-runner is the oe-ligature, which is &oelig;. Greek works, too, but be aware that not everyone will have an extensive set of fonts loaded.
  • If there is a cover image for e-readers supplied with the project, or you are creating one yourself, you can find information on what is needed in your HTML in the Proofreaders' Guide to EPUB or the PP guide to cover pages.

Process Hi-resolution Images (? hr.)

If the project manager provided high-resolution scans of the images in the text, use an image processing program such as GIMP or Adobe Photoshop Elements to optimize them—see Guide to Image Processing. You can do this before, during, or after HTML conversion. For each image:

  • Load image from the originals folder (see the Initial Setup step).
  • Straighten it (almost all scanned images are off-perpendicular; some are trapezoidal owing to the page not being flat on the scan window).
  • Crop it to remove all redundant white space and borders (provide margins and borders with CSS styling of the <img> markup).
  • Correct the contrast (you must have calibrated your monitor, see this page).
  • Sharpen.
  • Correct any major scratches, freckles, dirt, etc.
  • Save in the subfolder images using appropriate type:
    • Line drawings in .png at 8 bits per pixel (not the default 24-bit RGB format).
    • Photographs as .jpg with an appropriate compression level such as (Photoshop) level 6.
  • Under HTML>HTML Generator, use the Auto Illus Search button. This will help add the images to the book.
  • Page through entire HTML book making sure that each image is being loaded correctly. Test each thumbnail if used.

Validate HTML and CSS

Perform these validation steps before submitting your book. Validation is also helpful while customizing the HTML and CSS above.

Upload the Finished Project

  • Prepare a new folder with a short name. The name you choose doesn't really matter because you only need it to create the zip file. The zip file itself is renamed automatically during the upload process.
  • Move into it only the files to be uploaded:
    • the etext file bookname.txt.
    • the .bin files related to those (some PPVers use Guiguts too!)
    • the HTML file if one was made
    • the images folder if required by HTML
Do not include the original images or the page images; do not include any work files or scratch files or auto-backup editions. If you have been told to upload directly to the Gutenberg site for a whitewasher, do not include the .bin file(s). All filenames should contain lowercase letters only.
  • Mac OS X users: the Finder creates hidden files named .DS_Store in any folder you display as a window. Although harmless, these files are not wanted by PG. Get rid of them as follows: In a terminal window, cd into the project folder. Run this command, copying its arcane syntax precisely:
find . -name ".DS_Store" -ok rm '{}' \;
You will be asked for deletion confirmation.
  • Linux and Mac users: cd into this folder and use the command unix2dos *.txt; unix2dos *.html.
  • Use a zip utility to make a zip archive of the contents of this folder. Do not zip the folder itself, just its contents - the text file(s), HTML file and images folder should be at the top level of the zip file. This enables the automatic checking programs at PG to find the files. (OS X users: do not use the Finder command File> Create Archive of...; it creates a gzip file that PG cannot use. Use a zip command in a terminal window.)
  • Windows users: The "images" folder will often contain a hidden file called thumbs.db. This shouldn't be included in the upload. The easiest way to get rid of it is to open the finished zip-file, navigate to the "images"-folder and delete it from there if present.
  • Open the project page in your web browser and at the bottom, select Change Project State: Upload for Verification.
  • On the next page, write comments noting any unusual features of the book.
  • Use the Browse button to navigate to the zipped file. Wait while it uploads, which can take quite a while.

Ta-daaaa! Finished!!* Treat yourself to your favorite beverage! When refreshed, return to Step 1.

*Well, finished until you get the first PM from the PPVer listing the things you forgot to do...

Related Pages