Basic PPGEN Checklist for use with Guiguts
Prepared by Linda Hamilton, 7 October 2014
Setup
Activity Details
Download the text and images files and unpack in new folder:  images (nnn.png) in subfolder pngs and hi-res illustration scans (imagenn.png) in subfolder originals
Set image directory Prefs" -> "Set File Paths" -> "Set Images Directory"
Set Gutcheck Options "Fixup" -> "Gutcheck Options" - > Check "-v Enable verbose mode", "-p Report ALL unbalanced double quotes" & "-s Report ALL unbalanced single quotes." Set spell dictionary
Comments Read Project and Forum Comments
   
General Cleanup and Checking
Activity Details
Page Through book Check for end-of-page hyphens, missed paragraph starts at start of page, missed bold/italics, blank pages, and notate type of formatting you'll need so you can start building ppgen macros. As you go, remove blank pages, and extra continuation marker that cross pages. Mark Poems.
Illustrations/tables Move Illustration tags & tables to paragraph breaks (and to appropriate spots)
Blank pages Check that they're really blank and remove [blank page] text
Resolve *'s  project comments left by proofers and hyphens Search for * and try to resolve. Make sure you've turned off "whole word". When you find a hyphenated word with a *, try search-replace to see if the text uses the version with or without the hyphen elsewhere in the text. Resolve *s that are page or line endings endings. REMEMBER TO UNCHECK "WHOLE WORD" Setting in Search/Replace. When you make a change add a <target id='tn004'> specific to the page it's on and create the related TN Note on a separate page (#"dectective" changed to "detective" on Page 660:tn661#)
Continuation Markers (this can be done with other checks) Remove any continuation markers */*, #/#  -- (Check that /* */, /$ $/, /p p/ & /# #/ markups match & are correct and on own lines.) - Delete cross-page /* /# /P
Find Orphaned Brackets & Markup" Using Search & Replace/Find Orphaned Brackets - Click the "/* */" & Hit "Search", correcting any mistakes the search routine find. Correct broken /* markups, leave all other asterisks, they'll be fixed up later. Also try (?<!/)\*(?!/) to search for orphaned asterisks. Do the same with /p and /# (You want to make sure that they're all in order before you do the following search and replace)
Remove End of Line spaces Fixup/Remove End of Line Spaces -- do this periodically
First Pass "Fixup" -> "Footnote Fixup" - "First Pass" button. Guiguts moves thru identifying anything it thinks is a Footnote and its associated anchor. The pass will finish with the first footnote anchor highlighted in orange, the actual associated footnote in green. At the top of the Footnote pop-up, it shows how many footnotes are in the text. It may help you to select "Unlimited Anchor Search" if your footnote markers are a long way away from the actual footnote text.
Check Footnotes Hit the "Check Footnotes" button. Pop-up lists all footnotes found. Footnotes marked in yellow have duplicate anchors, footnotes marked in pink have no anchors. Use the "Go to - #" footnote dropdown in the Footnote tool to fix any errors. & re-run First Pass & Check Footnotes to make sure all errors have been fixed.
Check Footnote Count "Fixup" -> "Run Word Frequency Routine" -> "Sort Alpha" -> "Re Run". Look for how many times the word "Footnote" occurs in the text, it should match the number of Footnotes found by the Footnote tool, if it doesn't you have a missing Footnote somewhere, use the features of Word Frequency lists mentioned in 2.6.3 to fix any errors. Run the Footnote "First Pass" step if you fix any errors. 
Step through all Footnotes "Next FN -->" & "<-- Last FN" buttons to step your way through the Footnotes guiguts has found, fixing any errors and joining any multi-page footnotes. You can use the "See Anchor" & "See Footnote" buttons to make sure that the anchor is associated with the correct footnote text. If you fix a missing ending bracket problem, hit the "Adjust Bounds" button and guiguts will find the new closing bracket. For each footnote, select the appropriate symbol style to use with the "Number", "Letter" or "Roman" buttons. You can use the "IMAGE" button to check the original symbol style if needed. Numbers are recommended if you have large numbers of footnotes. Don't worry about removing duplicate footnote symbols, this will be handled auto-magically in the next step. Guiguts depends on Footnotes being in the right format, [?] for markers and [Footnote ?: text] for the footnote. Make sure you correct all footnotes to this format. You can set manual anchors if you have lots of footnotes in [Footnote: ? text] format, see the guiguts manual on how to handle those.
Index Footnotes Once you have stepped through and checked, adjusted, fixed and anchored all of your footnotes, hit the "Re Index" button. For out-of-line footnotes, this will renumber all of the footnotes using the same family of symbol that it had originally or a number if it had no anchor marker. This will close up any gaps in the numbers and remove duplicates. You can make changes and re index as often as you like.
Check for inconsistent punctuation after markup =. Versus .=, .</i> and </i>. 'I search for the following regular expressions to check punctuation:    =[^\w\s]    and    [^\w\s]=  (Try it with _ for italics and \* to see the asterisks. I think I saw some inconsistencies with the asterisks too.) Also brackets and punctuation.
HTML Markup problems "Fixup" -> "HTML Markup" -> "Find orphaned Markup" You can also use the Character Count feature of the Word Frequency function to see numbers of <'s & >'s and ( ) etc. 
Check for Tabs Using Character Counts of Word Frequency
Word Frequency - Em Dashes Rarely find stuff - Hit "Emdashes" - you will see a list of words that contain a hyphen, all words that are identical except that they DON'T have a hyphen and the words that are identical except that they contain an emdash (two hyphens). Fix any words which are marked with em-dashes, but should just be hyphens. This check will not find all occurences of this error as it doesn't list all em-dashes. See the manual for more details. 
Word Frequency - Hyphens VERY USEFUL!!! - Select "Alph"(abetical sort) & then Hit "Hyphens" You will see a list of hyphenated words from the project in alphabetical order. The number to the left represents how many times the word appears. If you scroll down the list you'll see non-hyphenated versions of the same word marked with *, resolve any conflicts by looking at page image, hit the 'See Image' button, if required. 
Word Frequency - Alpha-nums Hit "Alpha/num" - will catch 1ine etc. Check consistency of dates.
Word Frequency - Spelling Hit "Spelling"     You'll see a list of words not in the dictionary in Alpha order which occur in the text. This is a useful check for resolving inconsistencies with proper names.
    Work through the list fixing any errors. This is just an initial check. A full spell check will be run shortly. 
Word Frequency - Bold/Italic Hit "Ital/Bold Words" -     You'll see a list of words and phrases which appear in   &  markers in the text & those which do not. This is a useful check for resolving inconsistencies with italics markers for abbreviations & Journal titles.. 
Word Frequency - Caps Not so useful - Hit "ALL CAPS" - useful for checking you have all the CHAPTER headings.  If you need to change the case of any text, just select it & use "Selection" -> "UPPERCASE Selection". The text will change to ALL CAPS. Other case changing features which are available from the "Selection" menu are "lowercase Selection", "Sentence case selection" and "Title Case Selection." 
Word Frequency - Mixed Caps Not so useful - Hit "MiXeD CasE" - Useful for checking Chapter headings
Word Frequency - Initial Caps Hit "Initial Caps" This is useful for checking Proper Names.
Word Frequency - Check Char Counts Hit "Character Cnts" This is useful for weird characters & spotting missing brackets, (, ) counts should be the same for example. Mark-up mismatches for [] & <> are also easy to spot here too. Good way to catch tab characters that snuck through.
Word Frequency - Check Upper Hit "Check , Upper" This is useful for checking for , -> . scannos.  If there's a lot of dialogue then this generates so many false positives it's not really much help
Word Frequency - Check Lower Hit "Check . Lower" This is useful for checking for . -> , scannos.  Generally yields a manageable number of things to check (usually abbreviations in the middle of a sentence)
Word Frequency - Accents Also good for catching cases where there are tabs
Hit "Check Accents" - Useful for resolving inconsistencies with accented characters. Works like hyphen check. The Latin-1 chart available from "Help" -> "Latin-1 Chart" pops up a window which allows you to click on accented characters to easily add them to the text file. Characters are inserted in the text file at the cursor.  I also use the Search & Replace Popup at this stage to check for ae & oe ligatures. I add them back into the .txt if I find they have been removed by the proofing rounds. I proof the oe ligature as [oe] so I can subsititute it later on for a HTML entity in the HTML version. "Check Accents" button in the word frequency window to see if downscaling the accents might cause problems, and if you spot any potential problems, generate an ascii file by hand and upload both ascii and latin1. (The sort of problems are things like "cañon" which will end up as "canon" instead of "canyon", or "coöperate" which will end up ! as "cooeperate", or an aligned table containing "Cæsar" that will stuff up the alignment when expanded to "Caesar". 3) For ascii/latin1 text we normally just put "coeur", for html use
"co&oelig;ur". If there were lots of oe ligatures in the book, we might also generate a unicode text version (I never have, and a lone oe isn't important enough to justify a whole separate version).  
Check " and commas or periods Check for ." followed by a space or a word -->,'\w or \.'\w
Check commas at end of paragraphs ,\n\n -- also : or ; (Using Search & Replace set for regex)
T all by itself Scanno for I
Mr. & Mrs. Dr. Check Mr. and Mrs. To have periods Mr[^.,^s]  and Mrs[^.] works ALSO Mr\n and Mrs\n Do same for Dr. (Using Search & Replace set for regex)
Regex Checks "Search" -> "Stealth Scannos" -> use the Browser to go to the scannos directory in your winguts or guiguts folder. Select regex.rc (regex) and then later do en-commn (whole words), misspelled (whole words),  others_w (not whole words), others_s (not regex), others_r (regex), scanno_w  (full-words) , scanno_s (not whole word) and scanno_r (regex). 
Run Jeebies From Fixup Menu
Check quotes and commas space"space and space'space and try out start/end of line combos too
Search regex ^'space and space'$ and same for "
comma with no spaces, [a-zA-Z],[a-zA-Z] or [a-zA-Z]\.[a-zA-Z]
Check for spaces on first lines ^(space) using Search and Replace regex
Check initials space between or not? [A-Z]\. [A-Z]\. Versus [A-Z]\.[A-Z]\.
Check for double spaces  
Dashs and elipses  Check space-- and --space [A-Z,a-z] -- etc. This is good for TOC-type things and for propper names -- I think the program is mixing things up a bit on the proper names
Check elipses For spacing
Subscripts and Superscripts Subscripts - an underline character _ and surrounding the text with curly braces { and } / Superscripts by inserting a single caret (^) followed by the superscripted text. If the superscript continues for more than one character, then surround the text with curly braces { and } as well. Check use throughout
Check for no period at end of paragraph [a-zA-Z]\n\n
Check for spaced hyphens hyphen followed by a space
OE / AE Check oe and ae dipthongs to make sure they didn't get mixed up
Check i.e. i. e.  
Check for Greek use And correct
Spellcheck "Search" -> "Spell Check"
Check dipthongs If book has oe and ae dipthongs double-check that they're used correctly (they sometimes mix them up)
Move to Landing Zones Once you have set landing zones for all your footnotes, hit "Move Footnotes To Landing Zone(s)" & your footnotes will be moved. You can now re-run the "First Pass" step using "Last FN" & "Next FN" buttons with the "See Anchor" & "See Footnote" buttons to make sure all are correct.
Replace Page PNG info ^---*File: (\d\d\d\.png).*$  replaced by // $1  to get something like this // 019.png (try it out on one at a time til you're sure it works OK
Gutcheck to catch common errors in the file, such as unbalanced quotes.  "Fixup" -> "Run Gutcheck" (selecting paragraphs of text & using the "Search" -> "Highlight double quotes in selection" or "Highlight single quotes in selection"  is very useful in tracking down mismatched & wrongspaced quotes.). Don't look at line length. Make sure to try check lower.
Check for triple hyphens Make sure ---- isn’t --- (can search for regex [^-]---[^-] 
Check for dashes with spaces after unclothed hyphens -- space-- or --space . Good to check all dashes anyway since some are words too and should have spaces, AND Guiguts seems to be cutting spaces before them at some point. Check for space-- and --space and --endofline and start of line
Run pptext  
Fix Sidenotes Step through sidenotes with: Search&Replace of [S, not regex, not whole word, ignore case. Click Search to find each Sidenote. Compare to page image. Move note above paragraph if feasible. Otherwise, position it above the sentence to which it applies, with blank lines to prevent rewrapping if you decide that is best. Remember in HTML to use span if more than one per paragraph
Sidenote fix Check *[Sidenotes to make sure I didn't delete any I shouldn't have.]
   
   
FPN Formatting
Activity Details
Save again With a new filename (do this regularly -- just incrementing the number -- eg. school4-src.txt
Add comment re: book and date edited e.g. // ppgen source school-src.txt for Knots Untied
// last edit: 10 June 2014
Put in Book/Author near top of file .dt The Project Gutenberg Book of Knots Untied: Or, Ways and By-ways in the Hidden Life of American Detectives, by George S. McWatters
Use Caps for first letters of major words -- check this -- and edited by if edited and don't make it horribly long
Set up Macros Name and write up the formatting for the macros you identified that you'd need when you first paged through the book
Page through the text and the images Page through the book again in Guiguts, adding appropriate markup as you go. Don't worry about the front pages or ads at this point or even the TOC (unless you want to)
Link up internal links Add links to correct areas from within pages (eg. See Page 45 or In Chapter 5 etc.)
Set for Not showing Page numbers  To mark page numbers as comments and not have them show, put .pn off
.pn link
near top of document
Put in the Page advancement codes Search and replace (regex): (// \d+\.png)
with -- $1\n.pn +1
You'll need to remove the extra .np +1s that appear for blank pages that are not to be numbered and set start pages for pages that move from roman to alpha etc. or skip pages for some reason
Set start for page numbers Add .pn 1 or .pn v and .pn 340 etc. in the right spots where numbering starts or restarts (sometimes books have gaps -- pngs that are blank but aren't counted as page numbers). In those spots, get rid of the .pn +1
Chapter Starts Add smcaps if needed for words at start of chapters
Check page numbering before and add newpg before
Link Plate/Page Numbers to correct areas In text. In Hans (page ###) -- usually a ### followed by a ) or page ### or search the text in Firefox for Page and then look for Chapter and Figures etc.
Check poems, closings and check quotes and indents etc. around and in letters and closings and italics and whether a paragraph starts right after (or doesn't)
Front Pages Format them
TOC Do TOC and link page numbers to chapters
Illustration TOC Prepare TOC
Illustration IDs Put in id=i_586 or whatever into illustrations if you've got a TOC for illustrations. You can also use your regex to add a few other codes to get ready for finalizing the illustration, which you'll do when it's in smoothreading
Do illustration TOC Adding links etc.
Run GG Fixup - HTML Markup Find Some Orphaned Markup (it may act odd). And check using Word Frequency for number of > vs <
Frontspiece Spacing Should be four blank lines between frontispiece and title page (not two).
Do Index (if needed) http://www.pgdp.net/wiki/CSS_Cookbook/TOC_and_Index#Index_HTML
http://www.pgdp.net/wiki/CSS_Cookbook/TOC/IX_regex
   
After FPN Generation
Activity Details
Run Generator python d:\dp\tools\ppgen.py -i knots13-src.txt (if you're going to use the online generator, then make sure to call it school.zip
For a text-only version use python d:\dp\tools\fpn.py -i knots10-src.txt -o t 
For a line by line output try python d:\dp\tools\ppgen.py -i knots12-src.txt -d a
Check Spacing and look (book specific) and sidenotes for headings Throughout book  -- especially around letters for text and HTML versions
Check HTML versus HTML Tidy And fix source file, regenerating as needed
Run ppvtxt tool On text version run ppvtxt on the Fixup Menu
Run pphtml On html version from HTML Generator menu
Check Grammar and Spelling Use Word on a separate version of the text or html file so there's no chance of having it change something inadvertently
Remove end of line spaces  
Re-run Gutcheck on text version to check for line length
Smooth Reading Submit for Smooth Reading
Prepare Illustrations  Illustrations should be 700px (see http://www.pgdp.net/wiki/Guide_to_Image_Processing#Image_Display_Dimensions:_General_Guidelines guidelines). 
put in Caption Stylesheet info if needed near top of file --
.de .figcenter p { font-family:sans-serif;\
   font-size:smaller; }
Check footnote/sidenote consistency periods, commas, etc.
TN Fixup Fix up TN with UL and <li>s for html and - for text and add text about the page numbers being in the html source. You can get the HTML part of the TN (to add <li> etc. after generating the HTML)
Specific to book Change [=i] etc. for html if needed. Fix Beehive Hut image ALT
Cover Add cover to final copy of HTML plus update link to it.
.if h
.il fn=cover.jpg w=500px alt="Book Cover"
.ca "Transcriber's Note: The cover image was created by the transcriber and is placed in the public domain."
.pb
..
Optimize JPGs using jpegoptim
Final Checks
Activity Details
Check Smoothreading results And fix source, updating TN as needed
Links Manually check all links including footnotes and TN (and for TN, check that the changes were actually made!)
Check HTML Look at HTML -- check margins stay same throughout and that all the main formatting is as expected (page through book)
Check Text Look at Text -- check that all the main formatting is as expected (page through book). I'm updating text only now (and renamed file so if I do regen I won't copy over)
Fix up Odds and Ends In HTML and Text. Correct Tables etc. And run the check for long and short lines again for text and end of line spaces
Check pptext One more time
Check pphtml One more time
Check links "Fixup" -> "HTML Fixup" -> "Link Checker" (on HTML Version)
CHECK SPACING IN TEXT VERSION
Rerun previous checks HTML tidy, etc. 
No CSS Ensure HTML is basically readable without CSS enabled
Final check on HTML http://validator.w3.org/
Double-check links http://validator.w3.org/checklink
Check and validate css http://jigsaw.w3.org/css-validator/
Epub Check epub and correct issues (after zipping the html file and images) http://epubmaker.pglaf.org/index.php
   
Final Cleanup
Category Activity
Check for surplus images Use Guiguts .2.4 - HTML - Link Check
Check Page #s Check that Page Numbers in Comments match real ones
TOC Double-check TOC for alignment and punctuation in Text and HTML
Images Check image sizes using ppvimage.pl - 700 max inline, 1200 max otherwise.(100k max for inline 200k max otherwise). And check using Firefox that hasn't been incorrectly resized
Final Checks on TXT Spaces at end of lines and long and short lines  Regex ^.{75,}. Check all blockquoted text such as footnotes are consistent. AND check \n\n\n etc.
Final Checks on HTML Look over in Firefox and IE and rerun checkers
Name files correctly  
Make Zip files .txt, and html and images directory. All lower case. NO .bins and NO other files from image directory (include bin file if you don't have DU yet)
Upload Zips via PP Page If you don't have DU