User:SF2001/Guiguts PP Process

From DPWiki
Jump to navigation Jump to search


Shamelessly copied from Guiguts_PP_Process_Checklist on Nov 6, 2020.

This checklist is made for use with Guiguts version 1.1.1 with its standard menu.

It contains steps for all types of books including those with illustrations, poetry, footnotes, sidenotes and indexes; and it assumes you will create an HTML version. Not all projects need all steps.

(When editing this page, please use links to other wiki topics rather than repeating things covered elsewhere.)

Initial Setup (1 hr.)

  • Go to Project page
    • Read details and requirements.
    • Bookmark the project URL and note project ID number.
    • Read the project forum page and note any issues proofers raised.
  • Make a project folder, e.g. (Win) C:\dp\pp\bookname
    • Consider signing up for a free Dropbox account and putting your data in a folder that is automatically backed up.
  • Download the text and images files and unpack in the new folder from above:
    • Text to bookname.txt.
    • Page images (nnn.png) in subfolder pages
    • Hi-res illustration scans (imagenn.png) in subfolder originals
    • Make an empty subfolder images
    • Make a plain text file for notes

Sequential Inspection of Text (4-20 hr.)

This is the only step in which you will examine the whole ASCII text in sequence; hereafter you navigate with searches.

Some post-processors read the book carefully while others skim the text comparing it to the page images and double-checking format.

Open the original text file in Guiguts and save to a version with a suffix 00, as another backup.

Save to version 01 so autosave and manual incremental saves stay away from the previous checkpoint.

Be sure to turn on automatic scanno highlighting before starting during this pass.

Check for:

  • Proper markup of <i>italic</i> and <b>bold</b>.
Watch for punctuation wrongly contained in markups, such as <i>(ibid.</i> or <b>Subtopic.</b>.
  • Proper markup of Greek and other transliterations (content check later)
  • Block material all marked in some fashion:
    • /*  each line within the block will end with <br /> to preserve individual line breaks
    • /#  the block will be given a css class of "blockquot"
    • /$  is like /*, but the block will not be indented. This can be useful in marking tables in advance
    • /P  the block will be formatted as poetry
    • /i  the block will be formatted as an index (that letter is 'eye', not 'one' or 'ell'). The HTML Auto Index option can do this, but using the markup is simpler
    • /F  the block will be formatted as front matter: centered; lines will not end with<br /> so text can reflow to fit the margins
    • /L  the block will be formatted as an unordered list, beginning with <ul> and with each line beginning with <li> and ending with </li>. The block will end with </ul>
    • /x  the block will be enclosed with <pre> and will be subject to limited formatting. At DP, the only normal use of this is with genealogy charts, unless you are using "HTML Auto Index" rather than /I.
    • Note if there are any block markups crossing page boundaries so you know to keep an eye out for them in the next step
  • Figures properly in [Illustration: caption]
    • check: caption text agrees with List of Illustrations (if any)
    • consistent spelling, abbreviation, capitalization in captions
  • Fix Footnotes, Illustrations still inside a paragraph.
    • move outside paragraph to next or prior page as appropriate
    • don't worry about duplicate footnote number/symbol now
    • sidenotes handled later
    • Move illustrations to a more appropriate place if working on an older book where the illustration is on its own page due to limitations of putting 'plates' into books.
  • Make notes of things that will need attention in the HTML:
    • Author cross-references like "(p. 150)" and "see page 222" that should become links.
    • How the editor laid out special sections such as tables and sidebars.
  • Configure Page Labels to agree with the page numbers in the image.

Fix Block Markups and Proofer Notes (15-60 min.)

Save to version 02 as a checkpoint.

  • Use the Search menu to step through all /* */ blocks.
    • check for a blank line before and after markup
    • make sure correct type of markup used
    • close-up where broken at page boundaries
    • apply specific indent value if desired
    • mark poetry as /P..P/
    • make sure poetry line numbers are at least two spaces to the right of the line.
  • Use the Search menu to step through all /#..#/ blocks
    • check for a blank line before and after markup
    • make sure correct type of markup used
    • close-up where broken at page boundaries
    • check consistent indentation of block text
    • apply specific margin values if desired
  • Use Tools>Check Orphaned Brackets... Find Orphaned Brackets and correct orphans of each type in turn.
Do not omit the lowly parenthesis, often mis-scanned as curly-brace.
  • Use Search Menu, Search for Asterisks w/o slash, keep clicking "Search" to check all asterisks in document.
    • Look for malformed thought-breaks (5 stars)
    • Resolve proofer's notes, which are indicated by asterisks [** this is a proofer's note]

Basic Fixup (10 min.)

Save to version 03.

Format Front Matter (15 min.)

Save to version 04.

  • Format the title page, preserving as much of the original material as possible.
    • Protect in /X...X/ (no rewrap, no indent)
    • Protect in /F...F/ (no rewrap, centered in HTML).
  • Edit the TOC.
    • Protect TOC with /X...X/. Note that your TOC will probably need to be indented to prevent rewrapping, particularly if you use multiple spaces to align page numbers.
  • If book has illustrations, edit or create List of Illustrations (Note: this is not a requirement). Make sure it is 1:1 with [Illustration] captions. Protect with /X...X/.

Edit Transliterations (0-? hr.)

Save to version 05.

  • Tools>Character Tools
    • Find Transliterations (which searches for a left-bracket followed by anything other than F, I or S using the regular expression \[[^FIS] ). Check the content of each transliteration.
    • For Greek, use the Greek Transliteration Tool.

Fixup Page Separators (10-30 min.)

Save to version 06.

  • Run Fix Page Separators to remove visible page separators
    • Tools>Fixup Page Separators...
  • Search for [Blank Page] and delete them.

Apply Word-Frequency Checks (10-60 min.)

Save to version 07.

Open the Word Frequency report Tools>Word Frequency.... Double click on a word to search for it.

  • Set the Frq switch; click All Words.
List is now sorted by word frequency; scroll to the end and skim up the list of words that only appear 1 time looking for oddities and obvious misspellings.
  • Click Character Cnts. (middle-right)
    • Note characters that appear only once, check usage.
    • Check for equal counts of left & right parens () , brackets<>, braces {}', etc.
  • Click Emdashes.
This shows words with emdashes in them as well as similar words without emdashes (aka: suspects) marked with ****. Check suspects against the text and page images. Preserve author's intent even when inconsistent. Hint: Enable the Suspects flag and click Emdashes again to see only suspects words.
  • Click Hyphens. Similar to Emdashes above.
  • Click Alpha/num. Scan list for one/ell and oh/zero errors.
  • Click ALL CAPS. Scan list looking for oddities.
  • Click MiXeD CasE. Scan list looking for letters such as o that sometimes OCR wrongly as uppercase. Oh/zero errors can show up here, too.
  • Click Check Accents. Scan list looking for mistakes, inconsistent usages.
  • Click Check , Upper. Scan list for comma-for-period errors.
  • Click Check . Lower. Scan list for period-for-comma errors.
  • Click Ital/Bold/SC. Scan list for incorrect or inconsistent use of italics, bold face, and small caps.
  • Click Ligatures. Scan list for incorrect or inconsistent use of ae and oe ligatures.

Apply Scanno Checks (1-3 hr.)

Save to version 08.

See this topic for usage of the scanno checks.

  • Use Tools>Stealth Scannos...
    • Start scanno searching based on eng-common.rc. Work through the list.
    • Apply scanno searching based on misspelled.rc. Work through the list.
    • Apply scanno searching based on regex.rc. Work through the list.

Apply Bookloupe (10-45 min.)

Save to version 09.

Start the Gutcheck Process.


Work through the list, correcting as appropriate.

Apply Spellcheck (30-90 min.)

Save to version 10.

Start the spellcheck process.

Be sure to add the Good Words List to the dictionary

Proceed through the document, correcting words or adding them to the project dictionary as appropriate.

Fix Sidenotes (0-? hr.)

Save to version 11.

Read the discussion. Step through sidenotes with: Search&Replace of [S, not regex, not whole word, ignore case. Click Search to find each Sidenote.

  • Compare to page image. Move note above paragraph if feasible.
  • Otherwise, position it above the sentence to which it applies, with blank lines to prevent rewrapping if you decide that is best.

Fix Footnotes (0-? hr.)

Save to version 12.

Read the discussion and follow the steps on this page.

Fix Poetry Line Numbers (0-20 min.)

Save to version 13.

If the book has poetry that uses line numbers, read this page and align the line numbers consistently.

Check balanced markup

Save to version 14.

  • Find Orphaned Markup Search>Find Orphaned DP Markup...
  • Correct the error and click search until no more are found.

Consider page numbers

Save to version 15.

Consider whether you want visible page numbers in the text versions as well as the html version. This is not generally advisable, nor is it generally welcomed by PPV/WW, but there are exceptions. If you do want them in both versions, then now is the time to put them there, before forking the files below.

Curly quotes.

Save to version 16.

You want curly quotes in both the TXT and html version. Now is the time to put them in, before the split between TXT and HTML. There is advice about how to do this in the PPTools/Curly_Quotes_Versus_Straight_Quotes.

It is built in under the Txt menu.

  • Txt > Convert to Curly Quotes

Generally, it does a good job. When it is confused it leaves a @ to draw your attention

  • Txt > Curly Quote Corrections and tear off the menu
    • Select Next @ Line and work through the document

In the curly quotes selection, work through Select Next Straight Single Quote.

The Curly Quote Correction does not try to convert leading single quotes. It's too hard to differentiate the beginning of a quote and an apostrophe.

When done, search for "\s‘" in case you made a mistake and put a quote where a apostrophe was appropriate.

Online PPTxt

Save to Version 17.

The online PPTxt available at the PP Workbench

is very helpful at finding variations in spelling/hyphenation.

Save Edited Markup and fork for text/HTML (2 min.)

Save to version 20. It's a nice round number

  • Save any unsaved changes in bookname.txt.
  • Use File>Save a Copy As to make bookname.html
This will be the starting file for the HTML version. You can also use it as fallback in case you mess up and need to start the following steps over.
  • Re-open bookname.txt (if not still open).

Text processing

User:SF2001/Guiguts PP Process/Text

HTML processing

User:SF2001/Guiguts PP Process/HTML

PPComp HTML versus Text

Run PPComp to compare HTML and text. Are all differences explainable?

Smooth Reading

When you're pretty satisfied with the way your project looks you should submit your project for smooth reading if you think it is appropriate.

More information is here: Smooth Reading for Post Processors

It is suggested at least the text and HTML be uploaded as some smoothers do only text and some only do HTML.

It would be good idea to also include the MOBI and EPUB: This may gather more smoothers and forces you to generate and at least give a cursory check of those files.

Prepare for PPV

Some clues from my mentor



Upload the Finished Project

  • Prepare a new folder with a short name. The name you choose doesn't matter because it is only needed to create a zip file. The zip file itself is renamed automatically during upload.
  • Move into it only the files to be uploaded:
    • All filenames should contain lowercase letters only.
    • text file(s) bookname.txt
    • bookname.html
    • the images folder if there are images
    • Do not include
      • original images or page images
      • any work files or scratch files or auto-backup editions.
  • Use a zip utility to make a zip archive of the contents of this folder. Do not zip the folder itself, just its contents - the text, HTML, bin, and images folder should be at the top level of the zip file. This enables the automatic checking programs to find the files.
  • The images folder will often contain a hidden file called thumbs.db. This shouldn't be included in the upload. The easiest way to get rid of it is to open the finished zip-file, navigate to the "images"-folder and delete it from there if present.
  • Open the project page in your web browser, you will need info on that page.
  • Open directions for DU User:SF2001/DU
    • On the next page, write comments noting any unusual features of the book.
    • Use the Browse button to navigate to the zipped file. Wait while it uploads, which can take quite a while.

Related Pages