Guiguts 2 PP Process Checklist
Preamble
This checklist is an ongoing update to update the PP Process Checklist for use with Guiguts 2.0.
Current status: alpha.
Edited sections:
- Overview
- Project Analysis and Basic Formatting
- Prepare the Plain Text Version
- Prepare the HTML Version (in progress)
- Related Pages
Even though this is an open wiki page and anyone can make changes, while the first pass is in progress, please contact tjeffress with any suggestions.
Overview
Guiguts 2 is a software package for post processing projects so they can be uploaded to Project Gutenberg. The process goes something like this:
- Download the project files to your computer.
- Perform rigorous rounds of analysis to find any remaining inconsistencies or scanning errors.
- Format a plain text version of the book.
- Process any images that need to be included in the final book.
- Create and format an HTML version of the book.
- Perform more rigorous checks that your text and HTML meet the Project Gutenberg standards.
- Create e-book versions of your project and upload for Smooth Reading.
- Incorporate the Smooth Reading feedback.
- Create a .zip file for either Post-Processing Verification or Direct Upload.
This checklist contains steps for all types of books including those with illustrations, poetry, footnotes, sidenotes, and indexes. Not all projects need all the listed steps, so skip the ones that don't apply.
Note that this checklist tries to provide a logical and systematic order to the steps. But, as you gain experience with guiguts and post processing, you will probably develop your own preferred sequence. Also, every project is unique; a book of poetry will have different needs than a scholarly work with hundreds of sidenotes, footnotes, and illustrations. The ultimate goal is getting projects uploaded to Project Gutenberg, and no one will be offended if you achieve the end result in a different way.
An alternative approach to using guiguts is to create a single source file in ppgen format.
When editing this page, please use links to other wiki topics rather than repeating things covered elsewhere.
Project Analysis and Basic Formatting
Our first set of activities walks you through creating a folder for your project, downloading the project files, and using guiguts to begin addressing proofer comments, errors, and inconsistencies. In this phase you also perform some basic formatting that applies to both your text and HTML versions.
Initial Setup
- Go to Project page
- Read details and requirements.
- Bookmark the project URL.
- Read the project forum page, note any issues proofers raised.
- Make a project folder, e.g. (Win) C:\dp\pp\bookname or (Mac/Linux) /dp/pp/bookname.
- Within your project folder, create the following subfolders: images, originals, pngs, and src.
- On the project page, find the Post-Processing Files section and download the text and images zip files.
- Extract the text files to the src folder.
- Extract the images to the originals folder.
- Move the nnn.png files from the originals folder to the pngs folder.
- Copy the project text file from the src directory to the base project directory and name it bookname.txt.
- Choose File>Open to open bookname.txt.
- Choose File>Project>Configure Page Labels. This allows the page labels in bookname.txt to match the page images. (And ultimately displayed page numbers in the HTML version to match the original.)
- If your project has them, review src/good-words.txt and src/bad-words.txt files.
- If you want to use these files, move them to the base project folder.
- Choose File>Project>Add Good/Bad Words to Project Dictionary.
TIP: Version Control. Some post processing steps make major changes to your entire file. If something goes wrong, it's often useful to have a fallback version of your file, just in case. Anyone who frequently uses computer documents knows that you should save and save often. But with post processing, you should also save backup versions at various stages as a precaution. Some post processors use computer based version control like git, but you can achieve the same peace of mind by adding version numbers to your file name (bookname-v01.txt) and doing a File>Save As to keep creating new versions of your file. The checklist doesn't remind you to make backups, but if you just finished a step and said, "Whew, I'm glad that's done," you should probably make a backup.
Sequential Inspection of Text
This is the only step in which you will examine the whole text in sequence; hereafter you navigate with the built in tools or searches. Some post-processors still read the book carefully word for word. Others skim the text comparing it to the page images and double-check the formatting.
Either way, be sure to turn on Search>Highlight Proofer Comments and Search>Highlight HTML tags before starting this pass.
TIP: Go to the end of the document and create a new chapter and label it "Transcriber's Notes". Guiguts 2 has the ability to set multiple bookmarks, so you may want to use one of the bookmarks so you can easily jump to the Transcriber's Notes.
Check for:
- Proper HTML markup of font changes <i>italic</i> and <b>bold</b>.
- Watch for punctuation wrongly contained in markups, such as <i>(ibid.</i> or <b>Subtopic.</b>.
- Thought breaks marked with <tb>.
- Block material all marked.
- Poetry, tabular, and other no-wrap content marked in /* */.
- Block quotes in /# #/.
- You can fix block markups that cross page boundaries now or in the next step.
- Illustrations properly enclosed in [Illustration: caption] tags.
- Check: caption text agrees with List of Illustrations (if any).
- Watch for consistent spelling, abbreviation, capitalization in captions.
- Fix Footnotes, Illustrations, Sidenotes still inside a paragraph.
- Move these blocks outside the paragraph as appropriate.
- Don't worry about duplicate footnote numbers/symbols now.
- Make notes of things that will need attention in the HTML version.
- Author cross-references like "(p. 150)" and "see page 222" that should become links.
- How the book designer laid out special sections such as tables and sidebars.
- Other things you might work on during this first pass, but will also be addressed in detail in later sections.
- Proofer comments.
- Starred hyphenation/word-joining from the proofing rounds.
- Hypenation across page boundaries.
Basic Fixup
- Choose Tools>Basic Fixup with all options checked. (If the tool finds a lot of issues, you may want select specific options you want to fix now from the View Options feature, or use the Next Option button to step through each tool option.)
- Remove any instances of the [Blank Page] tag appearing on pages with no text or images. (If the pages aren't blank, then change these to illustrations as needed.)
Format Front Matter
Before you start formatting the front matter and block quotations (the next section), you might want to review the various Rewrap Markers offered by Guiguts 2, which provides much more control than just no-wrap (/* */) and rewrap (/# #/) markers.
- Format the title page, preserving as much of the original formatting as possible. Protect in /X...X/ (no rewrap, no indent) or /F...F/ (the same, except that it will be centered in the HTML version).
- Make choices about the block formatting of any other front matter, such as copyright, dedication, or other miscellaneous pages that probably have formatting exceptions.
- Format the table of contents (TOC). Find each matching chapter head; make sure heads are 1:1 with the TOC. Protect the TOC with /X...X/. Note that your TOC will probably need to be indented (Text>Indent 1 space) to prevent rewrapping, particularly if you use multiple spaces to align page numbers, hanging indents, or other formatting that need to be preserved.
- If book has a List of Illustrations format following the instructions for a TOC. If the book has illustrations but no List of Illustrations, you might create one depending on how useful you think that will be to future readers, but this is not a requirement.
Inspect Block Markup
Next, use the search tool to step through all the block markup. You can either do separate passes for no-wrap (/* */) and re-wrap (/# #/) blocks, or do them both at the same time.
- Check for a blank line before and after each block
- If useful, change the markers to one of he custom Rewrap Markers.
- Remove the unneeded close and open markers across page boundaries.
- Apply specific indent values if needed.
- Convert poetry from /*..*/ to /P..P/
- Make sure poetry line numbers are at least two spaces to the right of the line and aligned consistently.
- Mark the index(es) with /I..I/.
Address Proofer Comments
- Choose Search>Find Proofer Comments and resolve all proofer's notes.
- Choose Search>Find Asterisks w/o Slash and resolve all word join and page break issues. (And anything else that might have a stray asterisk. Of course, leave any asterisks that are part of the project's text.)
Fix Unmatched or Orphaned Markup
- Choose Tools>Unmatched>Block Markup to find any errors in block markup.
- Choose Tools>Unmatched>DP Markup to find any errors in our faux HTML markup for italics, bold, gespert, fonts, etc.
- Choose Tools>Unmatched>Brackets to find any issues with missing or incorrectly formatted brackets on Footnotes, Sidenotes, or Illustrations.
Apply Word-Frequency Checks
Choose Tools>Word Frequency. Click on a word to search for it in the text. Clicking multiple times will find subsequent instances if they exist. Several of the reports can limit the list to Suspects Only for a list of the most likely errors.
- Choose All Words and sort by Freq. The word list is now sorted by word frequency. Scroll to the end of the list until you reach the words that only occur once each. Scan through the single-use words and verify any oddities and fix obvious misspellings. (Don't forget to track changes in your Transcriber's Notes if needed.)
- Choose ALL CAPITALS. Depending on the project, sort alphabetically or by frequency and check.
- Choose Emdashes. Review usage, length, and spacing of em-dashes.
- Choose Alpha/Num. Check for possible misspellings or scannos.
- Choose Character Counts. Sorting by frequency and look for single use characters that seem out of place. Also compare the counts of opening and closing parenthesis, brackets, and braces since they most likely should be the same.
- Choose Diacritics/æ/œ and check for spelling and consistency.
- Choose MiXeD CasE. Scan list looking for letters such as o that sometimes OCR wrongly capitalizes. Oh/zero errors can show up here, too.
- Choose Hyphens. Review usage, consistency, and spacing.
- Choose Ligatures. Correct any missing on inconsistent usage.
- Choose Initial Capitals and review.
- Choose Ital/Bold/SC/etc. Scan list for incorrect or inconsistent use of italics, bold face, and small caps.
Apply Spellcheck
Choose Tools>Spelling. Proceed through the list, correcting words or adding them to the project dictionary as appropriate.
Apply Jeebies
Choose Tools>Jeebies. Proceed through the list, correcting words as appropriate.
Apply Scanno Checks
Choose Tools>Stealth Scannos.
- Start scanno searching based on en-commn.json. Work through the list.
- Choose misspelled.json. Work through the list.
- Choose regex.json. Work through the list.
Apply Word Distance Check
Choose Tools>Word Distance Check. Work through the list looking for possible typos, scannos, or inconsistencies.
Inspect, Renumber, and Relocate Footnotes
Use Tools>Footnote Fixup.
- Resolve any unattributed footnotes.
- Resolve any duplicate footnote anchors.
- Resolve any unformatted footnotes or missing anchors.
- Join any footnotes that are split across multiple pages.
- Once you have resolved the errors in the list of footnotes, choose All to Number and then Reindex. This will number the footnotes consecutively from 1 on through the entire document.
- Decide where you want to locate your footnotes and set landing zones.
- For books with relatively small numbers of footnotes, you should probably use Move FNs to Paragraphs and then skip the move to landing zones step.
- For books with more footnotes, you could either create a footnotes block at the end of each chapter (Autoset Chap LZ) or the end of the book (Autoset End LZ).
- Once you have landing zones set, choose Move FNs to Landing Zone(s).
Note: Do not use Tidy Footnotes at this stage. This removes the "Footnote" label from the footnotes which the HTML converter needs to properly format your footnotes.
Inspect and Relocate Sidenotes
If your project has sidenotes, choose Tools>Sidenote Fixup.
- Work through the report of sidenotes and resolve any sidenotes that are still mid-paragraph.
- Compare sidenotes to the page image. Move each sidenote above paragraph if feasible with Move Selection Up.
- Otherwise, position it before the sentence to which it applies. Leave blank lines around sidenotes if you want to prevent rewrapping.
Inspect and Relocate Illustrations
At this stage, you just want to make sure that the Illustration tags are formatted properly and placed at a paragraph break.
- Use the Tools>Illustration Fixup tool to inspect the Illustrations in the project.
- Look for and relocate illustrations still mid-paragraph or placed at the top of a page with an asterisk.
- Check that any captions match the text in the list of illustrations if the book has one.
- Check that any odd formatting is properly marked with re-wrap (/# #/) or no-wrap (/* */) tags, and that blank lines surround these blocks.
- While formatters may have moved illustrations to a nearby paragraph break, the illustration may fit better in another location based on the project content.
- If you move the illustrations or make corrections to the caption or list of illustrations, don't forget to add notes to your Transcriber's Notes.
Convert to Curly Quotes
Choose Tools>Curly Quotes>Convert to Curly Quotes.
- Work through the report of conversion issues and resolve any mismatched or unconverted quotes.
- If you have too many errors to deal with, you may want to use the View Options button to choose which errors you see, or you can choose Next Option to work through each type of error one at a time.
Convert Unicode-compatible Fractions
If your project has only Unicode available fractions that will look good in a text version, you can choose Tools>Convert Fractions>Unicode Fractions Only. Otherwise, wait until the formatting the HTML Version to do fancier fraction conversion.
Remove Visible Page Breaks
Choose Tools>Page Separator Fixup to remove visible page separators.
Apply Bookloupe
Choose Tools>Bookloupe.
- Work through the list, correcting as appropriate.
- If you have too many errors to deal with, you may want to use the View Options button to choose which errors you see, or you can choose Next Option to work through each type of error one at a time.
Apply pptext
Choose Text>PPtxt.
- Work through the report, correcting as appropriate.
- If you have too many errors to deal with, you may want to use the Checks options to choose which sections of the report you see.
Create the HTML File
At this point, you have made all the changes that you can before applying specific formatting of the text or the HTML versions of the project. Remember, after saving the HTML file, any changes that you subsequently make to the content of either file (not the formatting), you will need to make the same change in the other version.
- Save any unsaved changes in bookname.txt (File>Save).
- Choose File>Save a Copy As to create bookname.html.
- This will be the starting file for the HTML version. (And as a worst case scenario, you can also use it as fallback in case you mess up the text formatting and need to start the following steps over.)
Prepare the Plain Text Version
We now proceed to complete the formatting of the plain text version of the book.
- Make sure you are working on bookname.txt (and not the HTML file).
Convert Markup
Choose Text>Convert Markup.
- For each of the marker types in your document (italic, bold, small caps, gespert, and fonts), choose a text marker and apply each conversion.
- If you want to use the standard five spaced asterisks for thought breaks, apply the <tb> conversion. (Otherwise, you'll need to do a search and replace for the thought break tags and address them separately.
- For Small Caps choose between:
- Converting to ALL CAPS.
- Converting to text with a surrounding marker.
- Converting manually with Text>Convert <sc> Manually. This opens a custom search/replace dialog box to walk through common options for small caps. Work through the entire document until you have addressed all the small caps tags.
Format ASCII Tables
Use the search tool to look for no-wrap blocks (/* */) and inspect all tabular material.
- Compare tables to page images; reformat to best convey author intent.
- For complex tables, use Text>ASCII Table Effects to reformat.
Rewrap and Clear Rewrap Markers
- Save the file if any unsaved changes.
- Choose Tools>Footnote Fixup, and choose Tidy Footnotes.
- Choose Tools>Rewrap All.
- Scroll through entire text, looking for improper indentation.
- Choose Tools>Clean Up Rewrap Markers.
- Rerun Bookloupe and pptext. Resolve any new issues. (Update the HTML version of the file for any content changes.)
- Save the document.
Prepare the HTML Version
Finally, we format the HTML Version of the book.
Generate the HTML
- Open bookname.html.
- Choose HTML>HTML Generator.
- Ensure that the Title shows the correct version of the book title.
- Set optional switches as needed for your project.
- Choose Auto-generate HTML.
- Save the HTML file and open it in your web browser by choosing Custom>View HTML in browser.
- Scroll through the generated document looking for systematic errors. (Title pages, tables, etc. will look terrible at this point.) If automatic conversion messed up, delete the HTML file and start this step over with the backup file, addressing whatever formatting led to conversion issues.
Tweak HTML/CSS
While the HTML generator makes a passably readable HTML file, it will probably be far from the beautiful document you want to upload to Project Gutenberg. The following are pointers to creating a beautiful HTML document, but it is far from an exhaustive list. (Don't worry about inserting the illustrations at this point. From the text processing stage, the illustrations should already be in the correct location, and the next section goes over using the Auto Illustrations tool.)
- Page through the book looking for HTML formatting that could be improved, and apply fixes to the HTML and CSS as needed. In particular look at:
- Title pages. Ensure that the book title is encoded with <h1> tags and that it matches the <title> declaration in the HTML header. (See the Post-Processing FAQ.)
- Tables and Tables of Contents. The HTML>Auto Table tool can help format tables. Remove the auto-generated TOC if not needed. (Or use the hyperlinks for the book's native TOC.)
- Chapter and section headlines. Check that the headlines have the proper h2, h3, etc., hierarchy.
- Lists. The HTML>Auto List tool can help format lists.
- Blockquotes. Ensure that the CSS indents for the blockquotes follow the book's style. Especially check the formatting on blockquotes that had nested wrap or no-wrap markers.
- Hanging Indents. Add CSS and the necessary classes to your <p> tags as needed.
- Footnotes. The Generate HTML command should have properly formatted Footnotes, but review that they are properly linked and formatted.
- Sidenotes. Check the floating placement of the sidenotes and adjust as needed. (If you relocate a sidenote relative to the text, remember to make the corresponding change to your text file.)
- Indexes. If at the text stage, you placed /I I/ tags around the Index, the Generate HTML tool should have formatted any indexes correctly, but you may need to apply some tweaks if the indent levels or page references do anything unconventional.
- Transcriber's notes. Put <div class="transnote"> </div> around your notes.
- Make hypertext improvements:
- Use the HTML>HTML Markup tool as needed for adding HTML code. Use Search>Search & Replace to make global changes.
- Hyperlink page references in text, TOC, list of illustrations, and index (see linking page numbers and linking page numbers in indexes).
- Also hyperlink any pages or corrections referenced from the Transcriber's Notes to the relevant page or location in the text.
- Add <abbr> tags and "lang" attributes as appropriate.
- Convert hyphens to en-dashes where appropriate.
- Consider converting <i> tags to <em> or <cite> tags.
- If you haven't already, use Tools>Convert Fractions to convert any remaining fractions into Unicode or HTML super/subscripts.
- Fancy formatting (not required but can be a nice touch):
- Configure Drop Caps.
- Indenting the first line of paragraphs.
As you make corrections in guiguts, save the HTML file and use the browser's reload function to refresh the book view in the browser. Continue editing and checking until you have a pretty HTML document.
Process and Insert Images
If the project contains illustrations, , use an image processing program such as GIMP or Adobe Photoshop Elements to optimize them. (See the Guide to Image Processing.) You can process images at any stage before, during, or after HTML conversion.
If there is a cover image supplied with the project (or you are creating one yourself), you can find information in the Proofreaders' Guide to EPUB or the PP guide to cover pages.
For each illustration:
- Load the image from the originals folder (see the Initial Setup step).
- Straighten the image (almost all scanned images are off-perpendicular; some are trapezoidal owing to the page not being flat on the scan window).
- Crop to remove all redundant white space and borders (use CSS to provide margins and borders if needed).
- If a black and white image, convert to grayscale.
- Correct the contrast.
- Sharpen.
- Correct any major scratches, freckles, dirt, etc.
- Save in the subfolder images using appropriate type:
- Line drawings in .png at 8 bits per pixel (not the default 24-bit RGB format).
- Photographs as .jpg with an appropriate compression level.
After you have processed all the images, insert them into your HTML document:
- Choose HTML>Auto-Illustrations and work through placing each illustration into your document.
- Search the HTML document to ensure that all the Illustrations have been converted to an HTML figure and image.
- In the browser, scan through entire document making sure that each image loads correctly.
- If you used thumbnails to link to larger images, test that each thumbnail when clicked loads the correct image.
Validate HTML and CSS
Perform these validation steps before submitting your book. Validation is also helpful while customizing the HTML and CSS above.
- Use HTML>HTML Tidy. Fix any reported problems.
- If you have the HTML Validator locally installed, use HTML>HTML Validator. Otherwise, use W3C Markup Validation Service. Fix any reported problems.
- Remove unused CSS. HTML>PPhtml can help with this. Alternatively, check manually or use a tool such as the Firefox addons Firebug (with CSS Usage extension) or Dust-Me Selectors.
- Use HTML>CSS Validator. Alternatively, use W3C CSS Validation Service. Fix any reported problems.
- Use HTML>HTML Link Checker. Fix any reported problems.
- Use HTML>PPVimage to check for image-related errors. Fix any reported problems.
- Examine the generated EPUB and Kindle books for correctness. If you have EbookMaker installed, use HTML>EBookMaker epub/mobi Generation. Otherwise, use Project Gutenberg Online Ebookmaker.
Upload for Smooth Reading
Getting smooth reading feedback isn't required but it is highly recommended.
- Choose HTML>Ebookmaker. If you have a local version installed, you can use your local code. Otherwise, if your editing computer has an internet connection, you can use the online version.
- Create a .zip file including the text document, HTML document, epub files, and the images folder.
- Upload for Smooth Reading.
- Wait.
- Incorporate feedback from smooth readers (into both the text and HTML versions of your project).
Upload the Finished Project
- Prepare a new folder with a short name. The name you choose doesn't really matter because you only need it to create the zip file. The zip file itself is renamed automatically during the upload process.
- Move into it only the files to be uploaded:
- the etext file bookname.txt.
- the .bin files related to those (some PPVers use Guiguts too!)
- the HTML file if one was made
- the images folder if required by HTML
- Do not include the original images or the page images; do not include any work files or scratch files or auto-backup editions. If you have been told to upload directly to the Gutenberg site for a whitewasher, do not include the .bin file(s). All filenames should contain lowercase letters only.
- Mac OS X users: the Finder creates hidden files named .DS_Store in any folder you display as a window. Although harmless, these files are not wanted by PG. Get rid of them as follows: In a terminal window, cd into the project folder. Run this command, copying its arcane syntax precisely:
find . -name ".DS_Store" -ok rm '{}' \;
- You will be asked for deletion confirmation.
- Linux and Mac users: cd into this folder and use the command unix2dos *.txt; unix2dos *.html.
- Use a zip utility to make a zip archive of the contents of this folder. Do not zip the folder itself, just its contents - the text file(s), HTML file and images folder should be at the top level of the zip file. This enables the automatic checking programs at PG to find the files. (OS X users: do not use the Finder command File> Create Archive of...; it creates a gzip file that PG cannot use. Use a zip command in a terminal window.)
- Windows users: The "images" folder will often contain a hidden file called thumbs.db. This shouldn't be included in the upload. The easiest way to get rid of it is to open the finished zip-file, navigate to the "images"-folder and delete it from there if present.
- Open the project page in your web browser and at the bottom, select Change Project State: Upload for Verification.
- On the next page, write comments noting any unusual features of the book.
- Use the Browse button to navigate to the zipped file. Wait while it uploads, which can take quite a while.
Ta-daaaa! Finished!!* Treat yourself to your favorite beverage! When refreshed, return to Step 1.
*Well, finished until you get the first PM from the PPVer listing the things you forgot to do...