Guiguts 2 PP Process Checklist
Guiguts 2 is a software package for post processing projects so they can be uploaded to Project Gutenberg. Guiguts automates or simplifies many of the steps post processors need to perform to create upload-ready files. This checklist provides a guide to walk you through the process from beginning to end.
>>> NOTE <<<
This checklist is an update to the PP Process Checklist for use with Guiguts 2.0.x.
Current status: beta.
The instructions are complete. Even though this is an open wiki page and anyone can make changes, I would ask that please contact tjeffress with any suggestions.
Overview
The process for using guituts to create an upload-ready set of files goes something like this:
- Download the project files to your computer.
- Perform rigorous rounds of analysis to find any remaining inconsistencies or scanning errors.
- Format a plain text version of the book.
- Process any images that need to be included in the final book.
- Create and format an HTML version of the book.
- Perform more rigorous checks that your text and HTML meet the Project Gutenberg standards.
- Create e-book versions of your project and upload for Smooth Reading.
- Incorporate the Smooth Reading feedback.
- Create a
.zipfile for either Post-Processing Verification or Direct Upload.
This checklist contains steps for all types of books including those with illustrations, poetry, footnotes, sidenotes, and indexes. Not all projects need all the listed steps, so skip the ones that don't apply.
Note that this checklist tries to provide a logical and systematic order to the steps. But, as you gain experience with guiguts and post processing, you will probably develop your own preferred sequence. Also, every project is unique; a book of poetry will have different needs than a scholarly work with hundreds of sidenotes, footnotes, and illustrations. The ultimate goal is getting projects uploaded to Project Gutenberg, and no one will be offended if you achieve the end result in a different way.
An alternative approach to using guiguts is to create a single source file in ppgen format.
When editing this page, please use links to other wiki topics rather than repeating things covered elsewhere.
Project Analysis and Basic Formatting
Our first set of activities walks you through creating a folder for your project, downloading the project files, and using guiguts to begin addressing proofer comments, errors, and inconsistencies. In this phase you also perform some basic formatting that applies to both your text and HTML versions.
Initial Setup
- Go to Project page
- Read details and requirements.
- Bookmark the project URL.
- Read the project forum page, note any issues proofers raised.
- Make a project folder, e.g. (Win)
C:\dp\pp\booknameor (Mac/Linux)/dp/pp/bookname. - Within your project folder, create the following subfolders:
images,originals,pngs, andsrc. - On the project page, find the Post-Processing Files section and download the text and images zip files.
- Extract the text files to the
srcfolder. - Extract the images to the
originalsfolder. - Move the
nnn.pngfiles from theoriginalsfolder to thepngsfolder. - Copy the project text file from the
srcdirectory to the base project directory and name itbookname.txt.
- Extract the text files to the
- Launch Guiguts 2 and choose File>Open to open
bookname.txt. - Choose File>Project>Configure Page Labels. This allows the page labels in
bookname.txtto match the page images. (And ultimately displayed page numbers in the HTML version to match the original.) - If your project has them, review
src/good-words.txtandsrc/bad-words.txtfiles.- If you want to use these files, move them to the base project folder.
- Choose File>Project>Add Good/Bad Words to Project Dictionary.
TIP: Version Control. Some post processing steps make major changes to your entire file. If something goes wrong, it's often useful to have a fallback version of your file, just in case. Anyone who frequently uses computer documents knows that you should save and save often. But with post processing, you should also save backup versions at various stages as a precaution. Some post processors use computer based version control like git, but you can achieve the same peace of mind by adding version numbers to your file name (bookname-v01.txt) and doing a File>Save As to keep creating new versions of your file. The checklist doesn't remind you to make backups, but if you just finished a step and said, "Whew, I'm glad that's done," you should probably make a backup.
Sequential Inspection of Text
This is the only step in which you will examine the whole text in sequence; hereafter you navigate with the built in tools or searches. Some post-processors still read the book carefully word for word. Others skim the text comparing it to the page images and double-check the formatting.
Either way, be sure to turn on Search>Highlight Proofer Comments and Search>Highlight HTML tags before starting this pass.
TIP: Go to the end of the document and create a new chapter and label it "Transcriber's Notes". Guiguts 2 has the ability to set multiple bookmarks, so you may want to use one of the bookmarks so you can easily jump to the Transcriber's Notes.
Check for:
- Proper HTML markup of font changes <i>italic</i> and <b>bold</b>.
- Watch for punctuation wrongly contained in markups, such as
<i>(ibid.</i>or<b>Subtopic.</b>.
- Thought breaks marked with
<tb>. - Block material all marked.
- Poetry, tabular, and other no-wrap content marked in
/* */. - Block quotes in
/# #/. - You can fix block markups that cross page boundaries now or in the next step.
- Poetry, tabular, and other no-wrap content marked in
- Illustrations properly enclosed in
[Illustration:caption]tags.- Check: caption text agrees with List of Illustrations (if any).
- Watch for consistent spelling, abbreviation, capitalization in captions.
- Fix Footnotes, Illustrations, Sidenotes still inside a paragraph.
- Move these blocks outside the paragraph as appropriate.
- Don't worry about duplicate footnote numbers/symbols now.
- Make notes of things that will need attention in the HTML version.
- Author cross-references like "(p. 150)" and "see page 222" that should become links.
- How the book designer laid out special sections such as tables and sidebars.
- Other things you might work on during this first pass, but will also be addressed in detail in later sections.
- Proofer comments.
- Starred hyphenation/word-joining from the proofing rounds.
- Hypenation across page boundaries.
Basic Fixup
- Choose Tools>Basic Fixup with all options checked. (If the tool finds a lot of issues, you may want select specific options you want to fix now from the View Options feature, or use the Next Option button to step through each tool option.)
- Remove any instances of the
[Blank Page]tag appearing on pages with no text or images. (If the pages aren't blank, then change these to illustrations as needed.)
Format Front Matter
Before you start formatting the front matter and block quotations (the next section), you might want to review the various Rewrap Markers offered by Guiguts 2, which provides much more control than just no-wrap (/* */) and rewrap (/# #/) markers.
- Format the title page, preserving as much of the original formatting as possible. Protect in
/X...X/(no rewrap, no indent) or/F...F/(the same, except that it will be centered in the HTML version). - Make choices about the block formatting of any other front matter, such as copyright, dedication, or other miscellaneous pages that probably have formatting exceptions.
- Format the table of contents (TOC). Find each matching chapter head; make sure heads are 1:1 with the TOC. Protect the TOC with
/X...X/. Note that your TOC will probably need to be indented (Text>Indent 1 space) to prevent rewrapping, particularly if you use multiple spaces to align page numbers, hanging indents, or other formatting that need to be preserved. - If book has a List of Illustrations format following the instructions for a TOC. If the book has illustrations but no List of Illustrations, you might create one depending on how useful you think that will be to future readers, but this is not a requirement.
Inspect Block Markup
Next, use the search tool to step through all the block markup. You can either do separate passes for no-wrap (/* */) and re-wrap (/# #/) blocks, or do them both at the same time.
- Check for a blank line before and after each block
- If useful, change the markers to one of he custom Rewrap Markers.
- Remove the unneeded close and open markers across page boundaries.
- Apply specific indent values if needed.
- Convert poetry from
/*..*/to/P..P/ - Make sure poetry line numbers are at least two spaces to the right of the line and aligned consistently.
- Mark the index(es) with
/I..I/.
Address Proofer Comments
- Choose Search>Find Proofer Comments and resolve all proofer's notes.
- Choose Search>Find Asterisks w/o Slash and resolve all word join and page break issues. (And anything else that might have a stray asterisk. Of course, leave any asterisks that are part of the project's text.)
Fix Unmatched or Orphaned Markup
- Choose Tools>Unmatched>Block Markup to find any errors in block markup.
- Choose Tools>Unmatched>DP Markup to find any errors in our faux HTML markup for italics, bold, gespert, fonts, etc.
- Choose Tools>Unmatched>Brackets to find any issues with missing or incorrectly formatted brackets on Footnotes, Sidenotes, or Illustrations.
Remove Visible Page Separators
- Choose Tools>Page Separator Fixup. Guiguts will highlight the first page separator in the file.
- For most projects you will want to choose the Auto Fix option and then choose Delete to delete the first separator and start the automated removal.
- Guiguts will pause at any separator removal that needs your input. At each pause, choose the removal option needed until all the separators have been removed.
After you have removed the page separators, you can always see the page breaks with File>Project>Add Page Marker Flags. Page markers aren't the same as page separators. Page markers are inserted, highlighted text that show where guiguts considers the breaks between pages. (You can edit the page markers if you need to move them around, or even add or remove any if they somehow got corrupted.) Be sure to turn them off again when you're done: File>Project>Remove Page Marker Flags.
Apply Text Checking Tools
The following tools are presented in the order they appear on the Guiguts Tools menu, but the order you use them is entirely up to you. Remember that the goal is to find any scannos or inconsistencies and decide how you need to address those issues. Your particular project or your individual preferences as you gain experience with the tools will determine the order you employ them.
Apply Word-Frequency Checks
Choose Tools>Word Frequency. Click on a word to search for it in the text. Clicking multiple times will find subsequent instances if they exist. Several of the reports can limit the list to Suspects Only for a list of the most likely errors.
- Choose All Words and sort by Freq. The word list is now sorted by word frequency. Scroll to the end of the list until you reach the words that only occur once each. Scan through the single-use words and verify any oddities and fix obvious misspellings. (Don't forget to track changes in your Transcriber's Notes if needed.)
- Choose ALL CAPITALS. Depending on the project, sort alphabetically or by frequency and check.
- Choose Emdashes. Review usage, length, and spacing of em-dashes.
- Choose Alpha/Num. Check for possible misspellings or scannos.
- Choose Character Counts. Sorting by frequency and look for single use characters that seem out of place. Also compare the counts of opening and closing parenthesis, brackets, and braces since they most likely should be the same.
- Choose Diacritics/æ/œ and check for spelling and consistency.
- Choose MiXeD CasE. Scan list looking for letters such as o that sometimes OCR wrongly capitalizes. Oh/zero errors can show up here, too.
- Choose Hyphens. Review usage, consistency, and spacing.
- Choose Ligatures. Correct any missing on inconsistent usage.
- Choose Initial Capitals and review.
- Choose Ital/Bold/SC/etc. Scan list for incorrect or inconsistent use of italics, bold face, and small caps.
Apply Bookloupe
Choose Tools>Bookloupe.
- Work through the list, correcting as appropriate.
- If you have too many errors to deal with, you may want to use the View Options button to choose which errors you see, or you can choose Next Option to work through each type of error one at a time.
Apply Spelling Checks
Choose Tools>Spelling. Proceed through the list, correcting words or adding them to the project dictionary as appropriate.
Apply Jeebies
Choose Tools>Jeebies. Proceed through the list, correcting words as appropriate.
Apply Stealth Scannos Checks
Choose Tools>Stealth Scannos.
- Start scanno searching based on
en-commn.json. Work through the list. - Choose
misspelled.json. Work through the list. - Choose
regex.json. Work through the list.
Apply Word Distance Check
Choose Tools>Word Distance Check. Work through the list looking for possible typos, scannos, or inconsistencies.
Inspect, Renumber, and Relocate Footnotes
Use Tools>Footnote Fixup.
- Resolve any unattributed footnotes.
- Resolve any duplicate footnote anchors.
- Resolve any unformatted footnotes or missing anchors.
- Join any footnotes that are split across multiple pages.
- Once you have resolved the errors in the list of footnotes, choose All to Number and then Reindex. This will number the footnotes consecutively from 1 on through the entire document.
- Decide where you want to locate your footnotes and set landing zones.
- For books with relatively small numbers of footnotes, you should probably use Move FNs to Paragraphs and then skip the move to landing zones step.
- For books with more footnotes, you could either create a footnotes block at the end of each chapter (Autoset Chap LZ) or the end of the book (Autoset End LZ).
- Once you have landing zones set, choose Move FNs to Landing Zone(s).
Note: Do not use Tidy Footnotes at this stage. This removes the "Footnote" label from the footnotes which the HTML converter needs to properly format your footnotes.
Inspect and Relocate Sidenotes
If your project has sidenotes, choose Tools>Sidenote Fixup.
- Work through the report of sidenotes and resolve any sidenotes that are still mid-paragraph.
- Compare sidenotes to the page image. Move each sidenote above paragraph if feasible with Move Selection Up.
- Otherwise, position it before the sentence to which it applies. Leave blank lines around sidenotes if you want to prevent rewrapping.
Inspect and Relocate Illustrations
At this stage, you just want to make sure that the Illustration tags are formatted properly and placed at a paragraph break.
- Use the Tools>Illustration Fixup tool to inspect the Illustrations in the project.
- Look for and relocate illustrations still mid-paragraph or placed at the top of a page with an asterisk.
- Check that any captions match the text in the list of illustrations if the book has one.
- Check that any odd formatting is properly marked with re-wrap (
/# #/) or no-wrap (/* */) tags, and that blank lines surround these blocks. - While formatters may have moved illustrations to a nearby paragraph break, the illustration may fit better in another location based on the project content.
- If you move the illustrations or make corrections to the caption or list of illustrations, don't forget to add notes to your Transcriber's Notes.
Convert to Curly Quotes
Choose Tools>Curly Quotes>Convert to Curly Quotes.
- Work through the report of conversion issues and resolve any mismatched or unconverted quotes.
- If you have too many errors to deal with, you may want to use the View Options button to choose which errors you see, or you can choose Next Option to work through each type of error one at a time.
Convert Unicode-compatible Fractions
If your project has only Unicode available fractions that will look good in a text version, you can choose Tools>Convert Fractions>Unicode Fractions Only. Otherwise, wait until the formatting the HTML Version to do fancier fraction conversion.
Apply pptext
Choose Text>PPtxt.
- Work through the report, correcting as appropriate.
- If you have too many errors to deal with, you may want to use the Checks options to choose which sections of the report you see.
Create the HTML File
At this point, you have made all the changes that you can before applying specific formatting of the text or the HTML versions of the project. Remember, after saving the HTML file, any changes that you subsequently make to the content of either file (not the formatting), you will need to make the same change in the other version. So before splitting your text and HTML versions, you may want to re-run Bookloupe or other text checking tools to make sure that you have caught everything.
- Save any unsaved changes in bookname
.txt(File>Save). - Choose File>Save a Copy As to create bookname
.html.
- This will be the starting file for the HTML version. (And as a worst case scenario, you can also use it as fallback in case you mess up the text formatting and need to start the following steps over.)
Prepare the Plain Text Version
We now proceed to complete the formatting of the plain text version of the book.
- Make sure you are working on bookname.txt (and not the HTML file).
Convert Markup
Choose Text>Convert Markup.
- For each of the marker types in your document (italic, bold, small caps, gespert, and fonts), choose a text marker and apply each conversion.
- If you want to use the standard five spaced asterisks for thought breaks, apply the <tb> conversion. (Otherwise, you'll need to do a search and replace for the thought break tags and address them separately.
- For Small Caps choose between:
- Converting to ALL CAPS.
- Converting to text with a surrounding marker.
- Converting manually with Text>Convert <sc> Manually. This opens a custom search/replace dialog box to walk through common options for small caps. Work through the entire document until you have addressed all the small caps tags.
Format ASCII Tables
Use the search tool to look for no-wrap blocks (/* */) and inspect all tabular material.
- Compare tables to page images; reformat to best convey author intent.
- For complex tables, use Text>ASCII Table Effects to reformat.
Rewrap and Clear Rewrap Markers
- Save the file if any unsaved changes.
- Choose Tools>Footnote Fixup, and choose Tidy Footnotes.
- Choose Tools>Rewrap All.
- Scroll through entire text, looking for improper indentation.
- Choose Tools>Clean Up Rewrap Markers.
- Rerun Bookloupe and pptext. Resolve any new issues. (Update the HTML version of the file for any content changes.)
- Save the document.
Prepare the HTML Version
Finally, we format the HTML Version of the book.
Generate the HTML
- Open bookname
.html. - Choose HTML>HTML Generator.
- Ensure that the Title shows the correct version of the book title.
- Set optional switches as needed for your project.
- Choose Auto-generate HTML.
- Save the HTML file and open it in your web browser by choosing Custom>View HTML in browser.
- Scroll through the generated document looking for systematic errors. (Title pages, tables, etc. will look terrible at this point.) If automatic conversion messed up, delete the HTML file and start this step over with the backup file, addressing whatever formatting led to conversion issues.
Tweak HTML/CSS
While the HTML generator makes a passably readable HTML file, it will probably be far from the beautiful document you want to upload to Project Gutenberg. The following are pointers to creating a beautiful HTML document, but it is far from an exhaustive list. (Don't worry about inserting the illustrations at this point. From the text processing stage, the illustrations should already be in the correct location, and the next section goes over using the Auto Illustrations tool.)
- Page through the book looking for HTML formatting that could be improved, and apply fixes to the HTML and CSS as needed. In particular look at:
- Title pages. Ensure that the book title is encoded with <h1> tags and that it matches the <title> declaration in the HTML header. (See the Post-Processing FAQ.)
- Tables and Tables of Contents. The HTML>Auto Table tool can help format tables. Remove the auto-generated TOC if not needed. (Or use the hyperlinks for the book's native TOC.)
- Chapter and section headlines. Check that the headlines have the proper h2, h3, etc., hierarchy.
- Lists. The HTML>Auto List tool can help format lists.
- Blockquotes. Ensure that the CSS indents for the blockquotes follow the book's style. Especially check the formatting on blockquotes that had nested wrap or no-wrap markers.
- Hanging Indents. Add CSS and the necessary classes to your
<p>tags as needed. - Footnotes. The Generate HTML command should have properly formatted Footnotes, but review that they are properly linked and formatted.
- Sidenotes. Check the floating placement of the sidenotes and adjust as needed. (If you relocate a sidenote relative to the text, remember to make the corresponding change to your text file.)
- Indexes. If at the text stage, you placed
/I I/tags around the Index, the Generate HTML tool should have formatted any indexes correctly, but you may need to apply some tweaks if the indent levels or page references do anything unconventional. - Transcriber's notes. Put
<div class="transnote"> </div>around your notes.
- Make hypertext improvements:
- Use the HTML>HTML Markup tool as needed for adding HTML code. Use Search>Search & Replace to make global changes.
- Hyperlink page references in text, TOC, list of illustrations, and index (see linking page numbers and linking page numbers in indexes).
- Also hyperlink any pages or corrections referenced from the Transcriber's Notes to the relevant page or location in the text.
- Add <abbr> tags and "lang" attributes as appropriate.
- Convert hyphens to en-dashes where appropriate.
- Consider converting
<i>tags to<em>or<cite>tags. - If you haven't already, use Tools>Convert Fractions to convert any remaining fractions into Unicode or HTML super/subscripts.
- Fancy formatting (not required but can be a nice touch):
- Configure Drop Caps.
- Indenting the first line of paragraphs.
As you make corrections in guiguts, save the HTML file and use the browser's reload function to refresh the book view in the browser. Continue editing and checking until you have a pretty HTML document.
Process and Insert Images
If the project contains illustrations, , use an image processing program such as GIMP or Adobe Photoshop Elements to optimize them. (See the Guide to Image Processing.) You can process images at any stage before, during, or after HTML conversion.
If there is a cover image supplied with the project (or you are creating one yourself), you can find information in the Proofreaders' Guide to EPUB or the PP guide to cover pages.
For each illustration:
- Load the image from the
originalsfolder (see the Initial Setup step). - Straighten the image (almost all scanned images are off-perpendicular; some are trapezoidal owing to the page not being flat on the scan window).
- Crop to remove all redundant white space and borders (use CSS to provide margins and borders if needed).
- If a black and white image, convert to grayscale.
- Correct the contrast.
- Sharpen.
- Correct any major scratches, freckles, dirt, etc.
- Save in the subfolder
imagesusing appropriate type:- Line drawings in
.pngat 8 bits per pixel (not the default 24-bit RGB format). - Photographs as
.jpgwith an appropriate compression level.
- Line drawings in
After you have processed all the images, insert them into your HTML document:
- Choose HTML>Auto-Illustrations and work through placing each illustration into your document.
- Search the HTML document to ensure that all the Illustrations have been converted to an HTML figure and image.
- In the browser, scan through entire document making sure that each image loads correctly.
- If you used thumbnails to link to larger images, test that each thumbnail when clicked loads the correct image.
Compare the Text and HTML Files
The ppcomp tool on the Post-Processing Workbench will help you find any differences between the text and HTML files that may have subtly slipped in as you were working.
- Choose Tools>PP Workbench and use the ppcomp tool to resolve any differences between your two formatted files.
Validate HTML and CSS
Once you have formatted the entire HTML file, you run various checks to ensure that the file conforms to various HTML, CSS, DP, and Project Gutenberg standards. (While you only need to run these checks as a final step, several of the tools are useful to run periodically as you format your HTML file.)
- Choose HTML>Unmatched HTML Tags. Correct any mismatched tags in your document until the report shows no issues.
- Choose HTML>HTML Link Checker. Correct any internal references that don't have targets or references to files that don't exist. The list of anchors without links doesn't need to be empty. It's fine to have targets that you haven't linked to, although if any of these are your chapter titles, you might want to verify the links in your table of contents.
- Choose HTML>PPhtml.
- Image Checks. Ensure that all illustrations meet the size recommendations. Also check for any missing files or if your cover image is too small.
- Link Checks. Resolve any link issues. Some legitimate links can cause warnings, such as having a footnote that is referenced more than once.
- DP PPV Tests and Project Gutentberg Tests.
- Ensure your HTML title and H1 title match.
- Ensure you don't have any
<pre>tags. - Check that the document structure is logical and looks like a properly formatted outline. You should only have one
<h1tag. You will likely have many<h2tags (probably your chapters), and you should have at least as manyclass="chapter"properties as you have<h2tages. You may have<h3or even<h4tags if the author liked sections and sub-sections. Ensure that no lower numbered headlines appear under higher numbered headlines.
- CSS Tests. Create classes for "Classes used but not defined" and delete any "Classes defined but not used."
- Choose HTML>HTML5 Validator (online). Correct any reported issues.
- Choose HTML>CSS Validator (online). Correct any reported issues.
- Choose HTML>Ebookmaker. Resolve any Errors and if needed any of the Warnings. Once you generate eBook files without issues, open the files in an eBook reader application and look for any formatting issues unique to eBooks.
If you make changes to your HTML file, you should re-run all the above checks to ensure that you haven't introduced any errors.
Upload for Smooth Reading
Getting smooth reading feedback isn't required but it is highly recommended.
- Choose HTML>Ebookmaker. If you have a local version installed, you can use your local code. Otherwise, if your editing computer has an internet connection, you can use the online version.
- Create a .zip file that includs the text document, HTML document, epub files, and the entire images folder.
- Upload for Smooth Reading.
- Wait.
- Incorporate feedback from smooth readers (into both the text and HTML versions of your project).
- (Don't forget to rerun all of the text and HTML checks to ensure you haven't created any problems.)
Upload the Finished Project
Your next steps vary depending on if you have Direct Upload status or not. If you don't have Direct Upload status, follow the instructions for upoading for Post-Processing Verification.
Uploading for Post-Processing Verification
After following the instructions in Getting your PP Project Ready for PPV, create a .zip file with the following files from your project folder:
bookname.txtbookname.txt.jsonbookname.htmlbookname.html.json- The
imagesfolder and its contents. project_dict.json
We include the .json files so the PPVer can easily see the page images as they work through your document. You don't need to include the .png files, the original images, or any Ebookmaker files. If the PPVer needs these, they can download those from the project page or generate them.
Go to the project page and upload your .zip file for PPV.
Direct Uploading
If you have Direct Upload satus, you probably don't need to be told what to include in your .zip file, but we'll list them for completeness. Create a .zip file with the following:
bookname.txtbookname.html- The
imagesfolder and its contents.
Follow the instructions on Guide to Direct Uploading (DU) and Posting to PG to submit your .zip to PG.
Rinse and Repeat
Ta-daaaa! Finished!!* Treat yourself to your favorite beverage! When refreshed, return to Step 1.
*Well, finished until you get the first PM from the PPVer listing the things that in spite of scanning the files 100s of times, you somehow missed...