User:Monasi/my PP checklist

From DPWiki
Jump to navigation Jump to search

1. Initial Setup

  • Make a project folder, e.g. d:\dp\pp\bookname
  • Download the text and images files and unpack:
    1. Text to projectId.txt, copy the file also to projectName.txt.
    2. page images (nnn.png) in subfolder pngs
    3. empty subfolder images.
    4. Copy hi-res illustration scans in from pngs subfolder to images subfolder
  • Create notes.txt file for keeping 'to-do' notes.
  • Read the project comments and forum discussion, list issues proofers raised in notes.txt.

2. Sequential Inspection of Text

Check for:

  • Page labels table, mark empty pages, non counter pages, skipped page numbers, etc.
  • Proper markup of <i>italic</i> and <b>bold</b>.
Watch for punctuation wrongly contained in markups, such as <i>(ibid.</i> or <b>Subtopic.</b>.
  • Proper markup of Greek and other transliterations (content check later)
  • Block material all marked in some fashion:
    • block quotes in /# #/
    • tabular in /* */
    • convert poetry from /*..*/ to /P..P/
    • check for a blank line before and after markup
    • make sure correct type of markup used
    • close-up where broken at page boundaries
    • apply specific indent value if desired
    • make sure poetry line numbers are at least two spaces to the right of the line.
    • Fix block markups that cross page boundaries
    • check consistent indentation of block text
    • apply specific margin values if desired
  • Figures properly in [Illustration: caption]
    • check: caption text agrees with List of Illustrations (if any)
    • consistent spelling, abbreviation, capitalization in captions
  • Fix Footnotes, Illustrations still inside a paragraph.
    • move outside paragraph to next or prior page as appropriate
    • don't worry about duplicate footnote number/symbol now
    • sidenotes handled later
  • Make notes of things that will need attention in the HTML:
    • Author cross-references like "(p. 150)" and "see page 222" that should become links.
    • How the editor laid out special sections such as tables and sidebars.
  • Change spaces to non-breaking spaces where applicable (abbr., initials, etc.). For example, replace ([^ ]\.) ([^ ]\.) with $1 $2

3. Fix Orphan Markups and Proofer Notes

  • Use Orphaned Markup dialog to check and correct orphans of each type in turn. Do not omit the lowly parenthesis, often mis-scanned as curly-brace.
  • Search&Replace: text: (?<!/)\*(?!/) (a literal asterisk, but one neither preceded nor followed by a slash), regex; keep clicking "Search" to check all asterisks in document.
    • look for malformed thought-breaks (5 stars)
    • resolve proofer's notes, which are indicated by asterisk
  • Search for the new <tb> thought break mark.
    • Replace with Fixup/Add a Thought Break.
    • Or replace string in Search/Replace popup:
             *       *       *       *       *

4. Basic Fixup

5. Format Front Matter

  • Format the title page, preserving as much of the original material as possible. Protect in /F...F/ (no rewrap, no indent, centered in the html version).
  • Edit the TOC. Find each matching chapter head; make sure heads are 1:1 with TOC. Protect TOC with /X...X/ (no rewrap, no indent, uses pre in html).
  • If book has illustrations, edit or create List of Illustrations (Note: this is not a requirement). Make sure it is 1:1 with [Illustration] captions. Protect with /X...X/.

6. Edit Transliterations

  • Search&Replace: text: \[[^FIS] (left-bracket followed by anything other than F, I or S), regex. Check content of each transliteration. For Greek, use the Greek Transliteration Tool.

7. Remove Visible Page Breaks

  • Run Fix Page Separators to remove visible page separators
  • Use Adjust Page Markers tool to make sure page markers are at top of pages.

8. Apply Word-Frequency Checks

Open the Word Frequency report.

  • Set the Frq switch; click All Words. List is now sorted by word frequency; scroll to the end and skim up the list of words that only appear 1 time looking for oddities and obvious misspellings.
  • Click Character Cnts.
    • Note characters that appear only once, check usage.
    • Check for equal counts of left & right parens and brackets.
  • Set the Alph switch; click All Words. Scroll to the word Footnote and write down count for later use. (If the count is large, click once on Footnote and click 1st Harm. The harmonic window shows you any of the common misspellings of "Footnote" that occur.)
  • Click Emdashes. Conflicting usages are marked with asterisks; check against text and page images. Preserve author's intent even when inconsistent.
  • Click Hyphens. Resolve conflicts as above.
  • Click Alpha/num. Scan list for one/ell and oh/zero errors.
  • Click ALL CAPS. Scan list looking for oddities.
  • Click MiXeD CasE. Scan list looking for letters such as o that sometimes OCR wrongly as uppercase. Oh/zero errors can show up here, too.
  • Click Check Accents. Scan list looking for mistakes, inconsistent usages.
  • Click Check , Upper. Scan list for comma-for-period errors.
  • Click Check . Lower. Scan list for period-for-comma errors.

9. Apply Scanno Checks

  • Turn on automatic scanno highlighting and go through the entire text.
  • Use Fixup> Run Jeebies and examine its report of possible he/be errors.
  • Start scanno searching based on eng-common.rc. Work through the list.
  • Apply scanno searching based on misspelled.rc. Work through the list.
  • Apply scanno searching based on more-misspelled.rc. Work through the list.
  • Apply scanno searching based on regex.rc. Work through the list.

10. Apply Gutcheck

11. Apply Spellcheck

  • Start the spellcheck process. Proceed through the document, correcting words or adding them to the project dictionary as appropriate.

12. Fix Sidenotes (0-? hr.)

Read the discussion. Step through sidenotes with: Search&Replace of [S, not regex, not whole word, ignore case. Click Search to find each Sidenote.

  • Compare to page image. Move note above paragraph if feasible.
  • Otherwise, position it above the sentence to which it applies, with blank lines to prevent rewrapping if you decide that is best.

13. Fix Footnotes (0-? hr.)

Read the discussion and follow the steps on this page.

14. Fix Poetry Line Numbers (0-20 min.)

If the book has poetry that uses line numbers, read this page and align the line numbers consistently.

15. Check balanced markup

Search&Replace for \<(\w+)>\n?[^<]+<(?!/\1>) (any starting markup in <..> that doesn't end in an identical closing markup). Because it includes a newline, the search may take several seconds to return the first result.

  • Correct the error and click search until no more are found.

16. Save Edited Markup

  • Save any unsaved changes in projectId.txt. This will be the fallback in case of a mess up and need to start the following steps over.
  • Use File>Save As to make bookname.html. This will be the starting file for the HTML version.
  • Use File>Save As to make bookname.txt.

17. Convert Italic, Bold, and Smallcap

  • Fix italics: use Search&Replace, text </?i> (<i> or </i>), regex. ignore case. Replacement: underscore. Click Replace All. Italic markup is replaced with underscores.
  • Fix bold. Decide if you want to mark bold with $, or =, or by all uppercase.
    • For $, use Search&Replace, text </?b> (<b> or </b>), regex. Replacement: $. Click Replace All.
    • For =, use Search&Replace, text </?b> (<b> or </b>), regex. Replacement: =. Click Replace All.
    • For uppercase, use a regex search for <b>(\n?[^<]+)</b> (<b> then anything including newline up to the first </b>). Replacement: \U$1\E.
Click Search, then Replace until you are confident it works; then Replace All. Afterward, search for b> and hand-edit any remaining bold.
  • Fix Small-cap, which proofers have changed to <sc>Title-Cased-Text</sc>. regex find <sc>(\n?[^<]+)</sc> (<sc> then anything including newlines up to </sc>; note this will not find small-cap that spans other markup such as italic.) Replacement \U$1\E.
  • Save the document.

18. Fix ASCII Tables (0-? hr.)

  • Use Search>Find Next /**/ Block to step through all tabular material.
    • Compare to page image; reformat to best convey author intent.
    • For complex tables, use Table Special Effects to reformat.

19. Rewrap and Clear Rewrap Markers (10-30 min.)

  • Save the file if any unsaved changes.
  • Use Edit>Select All then Tools>Rewrap Selection. Wait while rewrap completes.
  • Page through entire text, looking for improper indentation. If found, re-open, clicking NO when asked if you want to save the edits. Find and fix broken rewrap markups. Repeat this step.
  • Open Fixup>Footnote Fixup; tidy up footnotes. See this discussion.
  • Remove all rewrap markers: see this page.
  • Use Fixup>Remove End-of-line Spaces.
  • Use Fixup>Run Gutcheck and resolve any new issues.
  • Save the document.

20. Determine Character Coding (5-60 min.)

Character codes are described here. You need to be certain which the coding your etext uses.

Search&Replace, text [\x7f-\xff], regex. If nothing is found, the book contains only characters from the 7-bit ASCII set and you are done.

If 8-bit characters are found, use Fixup> Run Word Frequency Routine. In the report window, click the Unicode>FF button. Words containing a multi-byte (Unicode) character are listed. If none are shown, the text is probably, but not certainly, Latin-1; it is possible that you have inserted Unicode punctuation that is not part of a word. But you should be aware if you have used the Unicode menu or pasted a Unicode symbol.

If your text has symbols from Latin-1 or Unicode, read or re-read this item of the Gutenberg FAQ. Decide if you will upload a single version or if you should do the division into ASCII and high-bit versions. If you will do it, then:

  • Use File>Save As to "fork" your single document into versions:
bookname.asc for a pure-ASCII version;
bookname.lt1 for a version with Latin-1 accented characters;
and/or bookname.utf for a version that has Unicode characters.
  • Open bookname.asc.
  • Search with the regex \P{IsAscii} (note uppercase P) to step through each character not 7-bit ASCII
  • Replace each, using some consistent substitution scheme (for example, ['e] for é, etc.).
  • Add a "Transcriber's Note" to the head of the text to document your substitution scheme.
  • In a similar manner, search bookname.lt1 for Unicode characters and replace them with Latin-1 equivalents. Add a "Transcriber's Note" to document the substitutions.

Pure-ASCII etext bookname.asc and optional Latin-1 bookname.lt1 and bookname.utf are ready to upload!

21. Prepare HTML Edition (4-? hr.)

You saved bookname.html before you rewrapped because it is handy to have the HTML source lines match the original text. However, the HTML generated for /P..P/ poetry sections assumes that these have been rewrapped. If the book contains poetry, select these sections and rewrap them so the poetry is properly indented.

Make a duplicate of bookname.html for fallback.

  • Open bookname.html.
  • If you will insert visible page numbers or anchors at page boundaries, then configure the page labels before proceeding
  • Open the HTML Palette and set optional switches as desired.
  • Apply Automatic HTML conversion and wait while it completes.
  • Save the file and open it in a browser.
  • Scroll through looking for systematic errors. (Title pages, tables, etc. will look terrible; no matter). If automatic conversion messed up, delete the file and start this step over with the backup file.
  • Page through the book looking for text that was not handled well by automatic HTML generation, in particular:
    • Title pages.
    • Tables.
    • Tables of Contents and Indexes, which are best formatted using unsigned lists, rather than the markup Guiguts generates for /$..$/.
    • Illustrations.
  • Use the element-markup buttons in the HTML Palette to mark up these areas. Use regex replacements to make systematic changes.
  • Open the file in one or more web browsers (Internet Explorer and at least one other such as Firefox or Netscape). Page through the entire book.
    • Where you see a problem, make a correction in Guiguts, save the file, and click the "reload" button in each browser.
  • Hyperlink page references in text, TOC, and index (discussed here).
  • Apply the Link Checker and correct all issues found.
  • Optionally, apply Tidy.
  • Open the WC3 Validator, upload the file, and correct the nits it picks.

22. Process Hi-resolution Images (? hr.)

If the project manager provided high-resolution scans of the images in the text, use an image-processing program such as The Gimp or Adobe Photoshop Elements to optimize them—see the wiki topic. You can do this before, during, or after HTML step 20. For each image:

  • Load image from the originals folder (see step 1)
  • Straighten it (almost all scanned images are off-perpendicular; some are trapezoidal owing to the page not being flat on the scan window).
  • Crop it to remove all redundant white space and borders (provide margins and borders with CSS styling of the <img> markup).
  • Correct the contrast (you must have calibrated your monitor, see this page).
  • Sharpen.
  • Correct any major scratches, freckles, dirt, etc.
  • Save in the subfolder images using appropriate type:
    • Line drawings in .png at 8 bits per pixel (not the default 24-bit RGB format).
    • Photographs as .jpg with an appropriate compression level such as (Photoshop) level 6.
  • Page through entire HTML book making sure that each image is being loaded correctly. Test each thumbnail if used.

23. Upload the Finished Project

  • Prepare a new folder whose name contains the full project ID, e.g. projectID40213e6231ac4
  • Move into it only the files to be uploaded:
    • the etext file(s) bookname.asc, bookname.lt1, and/or bookname.utf.
    • the .bin files related to those (some PPVers use Guiguts too!)
    • the HTML file if one was made
    • the images folder if required by HTML
Do not include the original images or the page images; do not include any work files or scratch files or auto-backup editions. If you have been told to upload directly to the Gutenberg site for a whitewasher, do not include the .bin file(s).
  • Mac OS X users: the Finder creates hidden files named .DS_Store in any folder you display as a window. Although harmless, these files are not wanted by PG. Get rid of them as follows: In a terminal window, cd into the project folder. Run this command, copying its arcane syntax precisely:
find . -name ".DS_Store" -ok rm '{}' \;

You will be asked for deletion confirmation.

  • Linux and Mac users: cd into this folder and use the command unix2dos *.txt; unix2dos *.html.
  • Use a zip utility to make a zip archive of this folder. (OS X users: do not use the Finder command File> Create Archive of...; it creates a gzip file that PG cannot use. Use a zip command in a terminal window.)
  • Open the project page in your web browser and at the bottom, select Change Project State: Upload for Verification.
  • On the next page, write comments noting any unusual features of the book.
Especially note the character code (7-bit, Latin-1, or Unicode) of the single .txt file, or the differences between multiple etext files.
  • use the Browse button to navigate to the zipped file. Wait while it uploads, which can take quite a while.

Related Pages

First pass check

Check through the text page by page, opening the corresponding page scan in your image viewer. You'll quickly notice unmarked poems or block quotes this way. Check for missing pages (rare, but it does happen) and illustrations. If the project has problems like missing pages, it would be nice if you could go to the project page and state the issue in the comments in order to make others aware of it. Check for asterisks * left by proofreaders, making you aware of questions/problems/markup

Run a search for * to find notes left by proofreaders/formatters to make you aware of their questions/solutions and potential problems. Check the markup

Make sure that the /* */, /# #/, etc. tags are balanced. Be sure that any poetry is in the correct markup to save messes later. This is a good time to check each poem is indented correctly or has the relative indents correctly added. Every tag needs a closing and properly placed tag and so on. You may wish to change some formatting tags to markup specific for your post-processing tools (e.g. /p p/, /f f/); check your tool's manual for details. Some PMs may request particular markup in the rounds. Also, check any markup that ranges over a page break and make sure it will still result in the desired formatting (usually by deleting all but the first and last markers for a particular section).

 	Rewrap? 	Indent?

No special markup, the default yes no /* */ poetry, etc. no no /# #/ block quotes, etc. yes yes Not officially adopted, but in use by various post-processing tools /$ $/ tables, etc. no no /p p/ poetry, etc. no yes

All of these markups normally should have a blank line before the opening tag and a blank line after the closing tag. They should be on a new line with no other text, unless your post-processing tool allows it. Straighten up the title page, table of contents, and list of illustrations

When formatting the title page, you have a bit of leeway. You can adjust the pieces a bit if you like: for example, you could move the author's name directly under the "by". Relative indenting is not required, but can be added if you wish.

Do block indent a consistent amount (from one to four spaces) if there are consecutive lines that should not be rejoined later in the process. The space is a flag in many text readers to preserve the given line endings should the general text need to be rewrapped to different margins.

For the table of contents and list of illustrations, please retain the page numbers. Line up the chapter titles and page numbers to make it look neat and easy to read. Copying the original format of the table of contents usually works fairly well. Leave all the original information on the title page, including the edition, year of publication and any copyright notice (unless this is a reprint—check with the project manager if in doubt). It is better to keep as much information as possible than to try to find it once the book has been posted for years. Footnotes

You will need to rejoin footnotes split across pages. Then, in the plain-text version, you can put the footnote after the paragraph it refers to or at the end of the chapter or section. Make sure that the number/letter/symbol in the text matches the tag in the note itself. In-line footnotes (footnote within a line of text) are discouraged even when extremely short.

Consider using end-of-paragraph footnotes if the footnotes are short, unique, and not common. Use end-of-section or chapter footnotes for longer footnotes (such as those that have poetry or block quotes), or those that have multiple references in the text for one footnote. Whichever you choose, be consistent within the work. Use all end-of-paragraph footnotes or end-of-section/chapter footnotes within one work. Don't switch back and forth.

For the HTML version, they can be moved to the end of the chapter or section or to the end of the project. They also need to be hyperlinked. Most of the post-processing tools will do this automatically. Refer to the tutorials, guides, or manual for your software, to find out how.

The preferred method is to renumber the footnotes so that each one in the book has a unique number, alphabetic letter, or Roman numeral to make it easier for the reader to search the text. Alphabetic letters and Roman numerals are not recommended for more than 20–30 footnotes as they become hard to read/distinguish. There may be some projects where you may prefer to retain the numbering as in the original publication. Check the text for problems

   * end-of-line spaces need to be removed to prevent double spaces when text is rewrapped
   * inconsistent line spacing around chapter and section headings
   * spaces around hyphens
   * spaces before punctuation . ! ? ; : ,
   * spaces around quotes in English and/or LOTE
   * mis-matched quotation marks
   * he/be errors
   * spaces around ( ){ }[ ]
   * spaces within abbreviations
   * multiple spaces in non-marked text (skip /* poetry */ )
   * incorrectly formatted thought breaks
   * incorrectly formatted ellipses (according to the rules of the text's language, or ensuring they all match the original if that is what you prefer)
   * dashes with three hyphens (---) instead of two (--) for an em-dash
   * appropriate spacing of em-dashes (-- and ----)
   * incorrect paragraph breaks
   * sort out any asterisks/stars/daisies/comments left in the text by proofreaders and formatters
   * compare hyphenated words throughout the text and decide whether to standardize, and whether to mention in a transcriber's note—for example, if there are 20 occurrences of "to-morrow", but only one of "tomorrow" you can decide whether to change the irregular one, but if there is not a clear majority, you will have to decide whether to leave them as proofread, or make a judgment on which way to change the odd one out (and then, whether you note this in a Transcriber's Note or not)
   * As the post-processor, you are responsible for resolving problems noted by the proofreaders. If you need advice or a second opinion, try any of the methods listed in the Help section of this document.

Handle any illustrations

Move each illustration tag to an appropriate paragraph break. Some post-processors like to have them just before, or after, the text they illustrate. Others prefer to place them at the end of the chapter, not wishing to interrupt the flow of the text. Do whatever you think is right for your book.

Note: Keep illustration markers in the plain-text version, in case people want to refer to the HTML version later. Please do not delete them unless requested in the project comments and/or discussion.

If you do not want to produce an HTML version, but your book has pictures, post in the HTML pool, where you can enlist someone to generate the HTML and pass it back to you for uploading. HTML versions are required for every book produced at DP with pictures (even if the project manager does not request it). Rejoin pages

Remove page separators, checking either side of them to see if the next page requires a blank line, is a section or chapter, or needs to be continuous text. You can rejoin words split across pages at this point if you haven't done so earlier. Spellcheck

Even if it looks like it's going to be a pain, spellchecking is always needed. Texts written before spelling was regularized might be the only reasonable exception, but even for those spellchecking is often useful. Even books with dialect or other deliberate non-standard spelling can be spellchecked. You may want to leave this step until later in your checklist, and/or repeat the spellcheck whenever you type in new information, including a transcriber's note. Paranoid text checks (stealth scannos, etc.)

These may be run by separate tools or by your main post-processing program. Refer to the manual or tutorial for the toolset you are using, or ask in the Post-Processing Forum.

Examples include "smart" programs which can check for he/be irregularities, or regexes (a form of search) which flag unusual letter combinations, such as "tb" (possible scanno for "th") or "rn" (for "m").

Various regex-searches are available and some tools will run these as a set, through your usual search-and-replace box—again, check the manual/tutorial for the software you're using. Otherwise, have a look at the Regular Expression Clinic for more information and help.

A great formatting check to run is the regex \n\n\n which catches all chapter and section spacing allowing you to confirm their consistency, as well as finding any extra line breaks between paragraphs—especially common after block quotes or poetry. It's a good idea to run this again on the text version, after you've removed markup such as /**/ and /##/. (See below.) Rewrap the text

Time to rewrap. PG advises to keep the HTML version as close as possible to the text version, so some post-processors will use the rewrapped text version for the basis of their HTML version of the ebook. Some however prefer to use the text before rewrapping to avoid having to adjust the text version a second time after formatting tags have been converted—see Creating a plain text version.

Did you see any poetry, tables, etc.? If not, rewrapping the lines should be easy. You will need to rewrap the lines to around 65–75 characters in length. (See PG's recommendation for line length.) Each program has a different way of doing this, and you will have to find the way that works best for you. Read the manual or instruction book for your utility.

If you found poetry, tables, etc. care needs to be taken when rewrapping that line endings are preserved as intended, and that they are block indented from at least one to four character spaces to prevent rewrapping in future versions of your texts.

If worst comes to worst and you cannot find an easy way to rewrap the lines, find and replace all line breaks with spaces, count any line to find approximately where 60–72 characters falls, and insert line breaks manually at this point. It's painful, but it works. (Be grateful that you chose a book with a low page count!) Alternatively, type a line like this:


at the top of your text and use this as your guide. However, manually rewrapping in this way should not be necessary.

Once your text is suitably rewrapped, remove any end of line spaces. (Again, use the post-processing software wherever possible! All current tools include this task.) Gutcheck

The Gutcheck tool was written specifically to pick out many of the most common problems with PG texts. It is probably the single most important check you will perform. Follow the instructions with your post-processing software. If you are not using a post-processing-specific tool, you can download Gutcheck from here, and run it according to the instructions given there. Either run the check initially with all options turned on, or run each check individually, but make sure not to skip any. Check every potential problem that it brings to your attention. Not all Gutcheck "flags" are genuine errors (for example, it may report short lines where the text contains poetry or a table), but each must be looked into and corrected if necessary. Continue to run Gutcheck after each series of corrections until it doesn't flag any more "true" errors.

If you do not want to download Gutcheck, use Project Gutenberg's online gutcheck. Some common things to watch for

   * Footnote markers are falsely flagged as "Wrongly spaced brackets". Check them anyway.
   * Lengthy hyphenated words often cause short lines above or below. Try rewrapping just that paragraph a few spaces shorter to rearrange the words sufficiently to cure this error. Short lines for the table of contents, lines of poetry, etc. are okay.
   * PG's standard line length is 60–70 characters for regular text and should be no more than 75 characters wide. If a threshold of 72 is used, with the random lengths of words, most lines will end up being 70 or less. The PG posting team also makes allowance for Chinese text, which has double-wide characters, to have line lengths around 40 characters. There may be justification for 80 characters for tables or other essentials (long line poetry might be another example). If there's absolutely no way to shorten a feature such as a family tree, you can leave it as is. It is often worth posting in the Post-Processing Forum though as others may see a sensible way to condense or reformat the feature.
   * Unless you are checking a deliberately-ASCII version of your text, you do not need to worry about characters flagged by "Non-ASCII character".
   * Wrongly spaced/missing quotes often appear where characters' quoted speech runs through several paragraphs. Check these, but if they are right according to the proofreading guidelines, that's good enough for Gutenberg posting.

Create a Transcriber's Note

If you make any changes to the text it is a good idea to include a Transcriber's Note. Sometimes these are quite simple:

Transcriber's Note: Punctuation has been normalized.

Sometimes the writing of the Transcriber's Note is not always as straightforward as it might seem. Some suggest using wording such as "obvious errors have been corrected" but others say that what is obvious to one person may not be obvious to another. Also, correcting what someone might think is an obvious error, may in fact be correct spelling/phrasing for the time the book was written.

A useful general one, especially for older, less regular texts, is:

Transcriber's Note: All apparent printer's errors retained.

This one stops the PG whitewashers from getting long errata requests to "fix" your text. It is not, however, an excuse for leaving in bad OCR, scannos, or similar detectable problems that are wrong in comparison to the page scan.

Sometimes the notes can be quite lengthy.

Transcriber's Notes:

Page 13, "10,00 troops" changed to "10,000 troops." (We fought 10,000 troops at St Germaine.)

Page 27, "Faw-cett" changed to "Fawcett". (Major Fawcett dictated the memo.)

etc. etc.

While we don't retain the individual page numbers in the text version, this gives the reader an idea of where it is in the book. The reader can search for the text you have included in the parentheses to find the exact location of your edit.

In the HTML version, the use of "hover" or "inserted" tags is a good way to shrink your list of changes while still maintaining the integrity of the original. Check the post-processing forum for ways of doing this, or follow the instructions here: CSS Cookbook—Corrections.

Many post-processors do fix what appear to be printer's "errers" (such as changing "errers" to "errors"). Do not modernize or switch the spelling from British English to American English or the other way around however. We are preserving history, not improving it.

Some put shorter notes, or ones that apply to the whole text in a general way, at the start of the book (before the title page), and longer lists at the end of the book (after any index or footnotes).

Transcriber's Notes are optional, but can help the reader's understanding of how you've processed the text. It's up to you how much or how little you note. If in doubt, talk to other post-processors in the forums, Jabber, or by PM about how they've handled various situations. Back-up including HTML tags

At the end of the above process, you have a processed book which contains HTML markup, as well as DP tags like [Footnote] or <tb>. Save a copy of this "dual" purpose file, calling it something like <name-backup.txt>. Creating a plain text version

   * Take the file you've been working on and name it something like: <funnyname.txt> Make sure you still have a version of the file containing the markup and call that <funnyname.html>. If you are going to produce different types of text files, call another copy <funnyname-ltn1.txt> and <funnyname-utf8.txt>.
   * Note that all file and folder names need to be in lower-case characters to ensure there is no upper/lower-case conflict later in the process at post-processing verification (PPV) and/or at PG. PG does ask for lower case and though it is technically not necessary this practice does prevent potential future linking problems. Also helpful if you are going to be working on many books is to give the file a name that can easily be associated with the book you are working on, and keep the name short but at least four characters long (files of three characters have been known to cause PG problems).
   * Use a monospaced font to enable alignment in display items such as tables and verse.
   * Now that you have various copies of your master file, you need to change them all if you find and correct any further errors.
   * For the plain text version(s),  and  need to be changed to _ and / to your preferred bold markup such as =; see the bold-markup thread for discussion of options. <f> and <g> tags can be handled similarly to bold, or stripped out if you prefer.
   * If your project contains any <sc> markup, refer to the Guide to Small Caps to find out how to handle them.
   * Do a quick search for the < and > characters to make sure none have slipped through.
   * Determine how you want to handle [oe] ligatures. Some post-processors will convert them to just oe in the plain text version. If the brackets are retained, mention this in a transcriber's note.
   * If you want to tidy your footnotes, (that is, make them read [1] text, rather than [Footnote 1: text] do it now).
   * If you haven't yet rewrapped your text to around 72 characters, do that now—see time to rewrap.
   * Remove markup from the funnyname.txt file. Rewrap markers and [Blank Page] tags need to go. Make sure there are no queries or notes left in the text.

Check formatting—text version

PG will accept alternatives to the following. The important thing is to be consistent throughout your book.

   * Chapters should have four blank lines above them, one between lines of the chapter heading, and two blank lines after, but before the main text of the chapter starts.
   * Sections should have two blank lines above, and one blank line after. This is all as per DP Formatting Guidelines.
   * <tb> should be replaced with a line of asterisks—that is, 7 spaces, followed by 5 stars, each spaced by 7 from the next, like this:
            *       *       *       *       *
   * Poetry should be indented from one to four spaces (this is a PG requirement, to prevent rewrapping in future versions of your text). Indents within the poem, i.e. relative indents, should be added on to your chosen indent. (For example, if a line is indented by 2 spaces from the line above, and you are using a 4-space indent for poetry, in your final version this line will be indented 6 spaces altogether.)
   * Block quotes should also be indented to show their separation from the rest of the text. If blocks in the book are not separated from the rest of the text, i.e. they appear as regular paragraphs, there is no need to indent them.
   * Tables, including tables of contents and lists of illustrations, also need to be indented to avoid rewrap/respacing.
   * Unusual features—tables, Greek, poetry, etc. see the Help! section.
   * Do a final Gutcheck, to make sure that there are no remaining problems, and that no issues have been introduced during the tidy-up process (such as short lines being left after the removal of HTML markup).

If you want to make your book available for smoothreading, now's the time.

Mac and *nix users need to change line endings to CR/LF. Smoothreading

An extra pair of eyes is always helpful in finding things you might have overlooked in the text. Smoothreading is an option available to all post-processors and is generally done on a text version.

Save a new version of your book (such as <funnyname-smooth.txt>), then place this file into a zip folder.

Make sure that the file name contains some combination of a-z, 0-9, -, _ and one . separating the filename from the extension (no capital letters, no spaces, no special characters other than those mentioned above). For example: Correct: funnyname-smooth.txt Incorrect: Smooth.txt

Go to the project page for your book. At the bottom of the page, you have three options: make the project available for smoothreading for one week, two weeks, or four weeks. Select the desired duration and upload the zip folder with the text file for smoothreading. You can provide comments about what to look for during proofreading, or to ask for attention in a particular section (this is very helpful in long texts). You might also like to advertise the availability of your book in the project thread, or in relevant team threads (see the Teams List for ideas).

Smoothreaders will mark possible errors in the text with [**description of query]. This is a standard format and should not be altered in your comments. When they finish, they will upload the smoothread project back to the project. At the end of the smoothreading period, you can download the smoothread versions from the bottom of the project page and search the text for [**. Not all [**comments] will be valid, just correct those that are. Make your corrections in the master file which still contains markup (or else make each change in every version of the text that you have, e.g. plaintext and HTML).

While your book is being smoothread, why not start work on any other formats that are required, or begin fixing up any illustrations? Creating an HTML version

Go back to your marked-up copy. Use a copy of the marked-up file, named something like: <funnyname-htm.html>. Make sure you keep a version of the marked-up file for backup and reference.

See HTML in the Help section of this document and Creating HTML versions.

Also see PG's guidelines. Main issues to check

  1. Ensure the HTML header title contains the line <title>The Project Gutenberg eBook of Name of Book, by Name of Author</title>
  2. Page numbers are correct and appear on the first line of the page of the original publication. Sometimes page numbers are not used by post-processors, but many feel they add value and do retain them.
  3. Images must be in a folder called <images>, file names must all be in lower-case characters, and the path of the links must go to the images folder within the same folder that contains the HTML file.
  4. Validate the CSS—the validator allows the full HTML file to be uploaded from your own computer. If you have web space, upload the file and check from there instead if you prefer, or, if you have direct upload access to PG you can validate when uploading.
  5. Validate the HTML markup.
  6. Check HTML with Tidy—note, don't let Tidy change your file, instead view the flags and make the appropriate changes yourself, otherwise the code will be difficult for anyone else to read or troubleshoot. Various tools including Guiguts have an inbuilt Tidy check.
  7. Check links—if you have web space you can use the online link checker; if not, various tools including Guiguts have inbuilt link checkers.

Note that external links are generally not permitted, except for links to other ebooks within the PG site. If used, there must be a disclaimer at the beginning of the file to explain that links going outside of the document may not work for various reasons, for various people, at various times.

This walkthrough assumes that you have already downloaded and installed: Guiguts, Aspell, Jeebies, Tidy, Irfanview and Xnview and you have Windows. Please make special note of the title. This is M's post-processing process. It will absolutely not be anyone else's. Please feel free, and please do, edit grammar errors. Please do not change the instructions to what you are sure is correct—I'd encourage you to document your own method too. Thank you so much.

Getting Started

Making a place to keep notes

Open a text file with Notepad and save it as "notes" in whichever file your project will be in. For example, for this walkthrough, I am using Sure Pop and the Safety Scouts. I will save this file in my DP folder in a folder of its own. I have a folder on my computer dedicated to JUST DP stuff. I've named this new project folder "surepop". Always use lowercase letters when naming your files as it will save you heartache later when you're renaming your illustrations. Trust me.

Go to your project's home page.

Download your project files

Scroll down until you see "Download Zipped Images." Click on this link. It will immediately start downloading. The more images your project has the longer this will take as your Project Manager will have included larger, high-resolution versions of your illustrations and possibly cover, end papers, etc. Once your files are downloaded, choose "Extract" or "Unzip" in and choose C:\DP\surepop\pngs (You'll have to type in your project's name and the pngs bit.) This will create a folder within your folder that holds all of your page images. Now delete the zip archive. You don't need it and it will just take up space on your computer. (Worst case, you can always download it again from DP)

Now go back to the project's home page and this time choose "Download Zipped Text". This download will take about 3 seconds. In your new zip you will find
1) The Good words list
2) The Bad words list
3)The proofed and formatted project text named something like: projectID48d4451eaf6c4.txt and
4) a link to the project discussion named projectID48d4451eaf6c4_comments.html

Now. The awful truth. I never use anything except the proofed and formatted text. So, I right click on the projectID48d4451eaf6c4.txt and choose extract. This goes into C:\DP\surepop This will also take about three seconds and it may seem like nothing happened. You can check by going to My Computer->C drive->DP->surepop. In surepop will be a folder titles "pngs," a text file named "notes," and a text file named projectID48d4451eaf6c4. (or whatever your numbers are.)

You are now done downloading and your project is all tidy in one place.

Read the Discussion

While your images are downloading, scroll up to the link "Discuss this project." Read through any comments on that thread so that you're aware of any anomalies facing you. Put any notes from the thread that you might need on your saved "notes" page to remember to address the issues when you find them, i.e. Eileen writes on the thread, "Page 7 has a description of the diagram on page 37. PPer may want to link that in the html." Copy that onto your notepad file to decide later after looking at it.


First things first. Let's get the illustrations out of the way and done. This will be very basic. Hopefully your projects do not need a lot of fancying up. There are people who are willing to help out if they do so don't despair. We're just going to assume that they are straightforward though.

Open Irfanview. Click on the open file button at the top left. Browse until you find your pngs folder. Double-click on that and scroll to the end (In Irfanview this will be sideways) until you find the first illustration file. It will be after the numbered pngs and be named something like i001 or illus-001 or cover. Double-click on the first one and it will open for you.

Often images need to be cropped because there is a lot of useless white-space around the actual illustration. To do this: click in the upper left corner, hold the mouse down and drag to the lower right corner. You've now highlighted the part you want to save. Click Edit->Crop selection.

Next it is time to resize. A good rule of thumb is that the longest side of your image should be no more than 400-600px. Click Image->Resize/Resample. I am using 450 px for my longest side as in the original the illustrations do not take up the whole width of the page. Your illustrations should be relatively sized to the original. For example, an illustrated drop cap usually only takes up about a fraction of a full page and so would usually only be about 100-125 high.

Make sure that the checkbox "Preserve aspect ratio" is checked. Then I put 450 in the width box and the height box adjusts itself automatically.

If your picture is black and white, such as a line drawing, you'll need to decrease the color depth to make your illustration's file size smaller. Click on Image->Decrease color depth. Check the little box in the middle for 16 colors. There is a little floppy disk in the tool bar. Click on this "To save as". A menu pops up; use the up arrow on that little tool bar on the little window to get to "surepop." Use the folder with the little asterisk icon to create a new folder (or right click in the big white space). Name the new folder "images". All lower case. Now doubleclick on your newly created folder. At the bottom Irfanview will already have taken the original name of the illustration image. As long as it is lowercase with no spaces, you can just leave it as that. You'll need to decide to use jpg or png. If your illustration is simply black and white, then png is what you want. For illustrations with more color use jpg. Use the menu beside "Save as file type" to find either png or jpg. Click save and you are done. Now Irfanview will save your last file type so if all of your illustrations are black and white you can just leave it on png. If you have both though, be careful to change it back and forth appropriately.

For Surepop, the illustrations are all color so I can just crop, clean up a bit if necessary and save.

So to sum up: Crop->Clean up if necessary->Resize->Reduce color if black and white->Save.

Starting the Text

Time to work on the new project. Open GG. It will say No File Loaded. So let's load one. Go to File->Open->surepop->projectID48d4451eaf6c4.

A new bug was introduced to GG in the latest version. It hates proofers names with . or @ or _. (Click on any place in the file and then at the bottom of the GG, a little past halfway over, "See proofers." Hopefully you'll not have any with a . or @ _ in their names as that creates a smash in the bin later.) Before you save your file the first time, click See Proofers. If any of them have a . or @ in their name, Search and replace it out. For example: Elston.The_Elephant

Search: Elston.The_Elephant
Replace: ElstonTheElephant

After you've done this, save your file immediately as surepop-firstpass.txt or whatever name best fits. Never, ever, ever keep that long projectID48d445 thing in your project's name. It isn't descriptive and sends PPVs and Wwers into fits. Small, descriptive, lowercase, unspaced names.

First Pass

We're going to start the first pass (looking for oddities) and adjusting page numbers at the same time. Click anywhere past the first

-----File: 001.png---\RJMAustin\SMH\Janet\JHowse\tenaj\--------------------


Adjusting Page Numbers

You'll notice that the second line from the bottom of GG has a lot of little boxes. Nine to be specific. Right click on the box named "Lbl: None." This will open your page numbering tool. Go back to the main GG window and click on the "See Image" box. This will open your first png with Xnview. Don't worry if you see png.002, pushing "home" on your keypad will always take you to the first png as "end" will take you to the very last illustration.

Looking for Oddities

Look at the first page. Often it is blank. For me it is the Title of the book. It already has the four blank lines before it as I like, so I can click "page down" on my keyboard and go to the next page. The second page has a quotation. My formatters saw this as a poetry quote but I think it is a blockquotation so I change the /* */ to /# and #/ and put the signature in its own / * */ . This is the point of a first pass. Making sure things are the way you'd like them to be. So far none of my pages have visible page numbers. That's okay. Just keep going through the first few pages making sure that your Title page, Copyright page, Table of Contents, etc. have the markup and line spacing that you want. Especially watch for missing italics or smallcaps. These pages have a lot going on and stuff is easy to miss.

When I get to the Table of Contents, I see that it says that the book's introduction is on Arabic page number 1. When I turn to the next page, which is a pledge the bottom of that page says it is Roman numeral page vi. This is my png.006. That matches exactly nicely. I go to the Configure Page Numbers tool and Change the first box to Roman. I leave the Start @ 001 in place as that's just what I want this time. The book's Arabic page 1 is my png.007. So I change the box next to Image# 007 on the CPN tool from " to Arabic. Then change the +1 to Start @ and type a 1 in the box. At the very top, press "Recalculate" and you'll see those first numbers change to Roman and the rest start at 1 at Image# 007.

Okay, back to "First Pass"ing. Go back to Xnview and use the page down key on your key pad to pretty quickly flip through the pages of your book. Stop if you see anything that you want to check or change. For example, on png.017 of my book, there is an illustration that is at the bottom of the page. I want to be sure it landed between paragraphs. I go back to GG and in the same row we've been clicking boxes on, click on the third box, "Img: 007." A little Goto Page Number box pops up and I type in 017 and enter or Okay. That takes me right to the page of the text I wanted to see. Looking at the image and text, I notice that that particular image has to be right aligned to make sense, so I make a note on my "notes" notepad page. As these notes will only ever been seen by you, you only have to make them make sense for you. In this case, I typed: "right align illo on png.017."

Words split across pages: As you scroll through, you'll probably find some words split across pages. You can:
1) rejoin those words on your first pass (Don't worry, if you miss any GG will find them for you later)
2) wait until you remove page separators and rejoin them then. or
3) the more controversial, let GG rejoin them knowing that
a)if the line in the html ends oddly your word could be split visually and
b) some PPVers hate this.
"3" maintains the original integrity better in that it shows that the word was split in the original but, usually, I don't think this is a semantic worth saving as we rejoin all other words split over the ends of lines. I tend to do "1" as I like how it makes "Removing Page Separators" that much faster. I inevitably miss one or two, but as I said, GG catches them for me and lets me fix them.

Watch for correct chapter spacing as well. I just found a chapter header with only 3 blank lines before it. GG will NOT find those for you and so will not format those properly in HTML. Also watch for wrong spacing or missing blank lines after #/ and */ markup. Without the blank line after, as I just found on one of my pages.

--<sc>Sure Pop</sc>

Would have been wrapped by a very confused GG into

/*--<sc>Sure Pop</sc>*/[Illustration]

Other notes I leave for myself:

move ! into </i> [This is AFTER the text is done and I am working on the html so that the italic words will not bump into ! ? ; :]
fix note on 063 [This note is more unusual than the other notes in the book as it has more than one line of centered text and some blockquoted text]

You'll find as you go things that you'll want to be reminded of and things you'll just do automatically. Notes are good for me if RL takes over and I have to come back to a project after a few weeks and have no idea what I've done. I always put at the top of my "notes" what I did and am going to do next, i.e. "Finished proofer's notes; time to -*"

Illustrations and Blank Pages in the Page Count Frequently on a first pass, you will find full page illustrations followed by blank pages. These are most usually not numbered. Sure Pop has none of these, but this is where adjusting your page numbers comes in as well. First, say the illustration and blank page following are on png.023 and 024. Go to "Configure Page Labels" and for those to Image#s, click the button labeled "+1" next to them until it says "No Count". Then push the Recalculate and you'll see that it now skips those Image#s in its count. Check to make sure that the next page after the Illustration and Blank Page matches what your Configure Page Labels now says. Delete all [Blank Page] notation. This is only to hold the place while proofing and should not appear in your finished project.

Watch for: [Illustration: ] This is not right and needs to be fixed to [Ilustration].

Final page number Now look at your book's last page with text on it. Mine is 130. I check to be sure that matches the last Label on Configure Page Labels. If it doesn't, it means that I made a mistake somewhere. Usually by forgetting to not count a blank page and/or illustration. Use the "page up" key on your keyboard to quickly flip back to the last illustration and check it. Once the numbers are correct, click "Use These Values" at the top of the CPL. The box will close and you will see that the box at the bottom that used to say Lbl: None now says something like Lbl: Pg 130. That will be very helpful for any transcriber's notes.

Page Separators

Now that we've been through our first pass, time to remove the page separators. GG knows where it is and will pause at ANYthing it finds odd, so on the task bar of GG go to ->Fixup->Fix Page Separators. A little box will pop up. I immediately grab the title bar and move it to the bottom of the window so that I can see the main GG window. Check the Full Auto box. Yes, it will be okay but if you're unsure feel free so save a copy of your file, just in case. Now push the refresh button and it will highlight the very first page separator. As the right number of blank lines are already there, I can just push "Delete" You'll see there are many options to choose from. "Delete" is the one you will use most often, but if you find a blank line is missing, just push that button and it will add it and move on. It will stop on any line with blank lines after it, before any line that starts with a capital letter, or any punctuation or <x> tag or after any that end with * or – or >, etc. After you fix or adjust anything, just push Refresh and it will take you on. When you reach the end of your file, close that window and save again.

Proofer's Notes

Proofer's Notes: The proofers and formatters will have left you notes with questions that they had. On the task bar of GG, go to Search->Search & Replace. A long window opens. Grab that and move it to the bottom (You'll do this pretty much every time so that it doesn't obstruct your view or disappear behind the main GG window.) Uncheck Whole Word and search for [*

This covers most malformed notes. My first note says


I've already seen "goodby" in the text so I know that is at least one of the ways this book spells it. Just to be sure though, I search for "goodbye". It only shows up in proofer's notes, so those are safe to delete and ignore. I also search for "good-bye" and find that it isn't there either. I put [* back into the search window and go on.

Goodby[** typo?] can again just be deleted.

Missing or wrong punctuation will need to be fixed or noted by you. On my notepad "Notes" I scroll down a bit and start a section entitled "Transcriber's Notes:" Here is where I'll put things that I changed or anomalies that I don't want Project Gutenberg to be annoyed by getting errata reports for. For punctuation, I almost always just fix all obvious errors and then include a blanket note. "Obvious punctuation errors repaired." On fiction, especially, wrong or missing punctuation is rampant and my list of Transcriber's Notes would be enormous if I noted every one. Do be absolutely sure that you are really only fixing broken things. For example, semicolons were used a lot more than they are now in dialogue. "Look out," she whispered; "I don't want you to fall!" is not uncommon. Both exclamation points and question marks can occur in the middle of a sentence. "Don't go! can you not just wait a moment?" is something that you might see and is not, for this time period, incorrect. A lot of our books used single quotes where we would now use double and also had double quotes within double-quotes. If you have any questions, ask in the forums. The depth of grammar knowledge on DP is amazing.

Words that are missing or misspelled, you have a choice: 1) Leave it and put a note at the end that you did so; 2) Fix it and note it at the end. I tend to fix and note as I think that is what the author would have preferred and it will make it easier on the reader. Notes look something like:

Transcriber's Notes:

Obvious punctuation errors repaired.

Page 91, word "to" added to text (minute or two to)

Page 103, word "as" added to the text (just as she had)

Page 104, "hedge-hog" changed to "hedgehog" (send the hedgehog to)

I include the actual book's page number to give the reader an idea of where the text is and also a small bit of the text so that they may search for it. This way if I've made an error, they can find and repair it and they also will know what the original said. Be wary of fixing spelling that was accurate then. For example, it would have been an error to change all of the "goodby"s to "goodbye" or "good-bye." That was correct for the time and this text. Once again, if you are unsure, ask in the forums and often someone will be able to find it in an old dictionary.

Questionably broken words or -*

Once you are through all of the [**notes], you can search for -* to make them as consistent as you reasonably can.

The first one that GG stops on for me is "house-*keeper". First I search for it as "usekeeper"and find it doesn't appear. Then I try "use-keeper" and find that isn't in the text either. I leave off the first letter or so in case it appears capitalized. This lets me see every time it shows up. Except it doesn't in this case. Hmm. This is now my decision. It is usually safe to retain the hyphen as hyphens were much more prevalent. In this case however, I think "housekeeper" is more usual, so I remove the -* and go on. For "street-car" I search for "streetcar" and don't find it but "street-car" shows up mid-line on another page so I know that the hyphen should be retained. Sometimes you will find words with and without hyphens mid-line. "streetcar" and "street-car" could both appear due to printer's whims. In that case, I usually go with the one most used on the end-of-line one, and put a note at the end.

Both "streetcar" and "street-car" were used in this text. This was retained.

This should, again, help PG not to get errata reports on this issue.

Remember to save your file often and save a copy with a different name at all major edits.


Convert all of your DP-style thoughtbreaks <tb> into Project Gutenberg-style thought breaks.

        *       *       *       *       *

We used to have to search and replace for those but now GG does it with one click. Go to Text Processing->Convert <tb> to asterisk break.

Orphaned markup

Once you've handled all of the -*, it's time to look for lonely markup. On the GG task bar go to Search->Find Orphaned Brackets and Markup. Again, move it to the bottom as this one will dive behind your main window almost every single time. This will search for /* that end with #/ or with nothing at all. Or ( that open but don't close. Or <i/ instead of <i>. Just click through each little radio button. It doesn't find things often but you'll be very very glad that it did if it does. One more orphan check to run through and this bit is done. Go to Fixup->HTML Fixup. A BIG menu with lots and lots of options opens. Ignore all of them at this point except: on the middle right "Find orphaned markup". This will search for <i>, <b>, <sc> that open but do not close. Or close but never opened.

My search finds:

<i>He looks where he goes and keeps to the right.</i>
He crosses at regular crossings, not in the middle of the block.</i>

which is not only a missing <i> but a missing paragraph break on this page. If nothing appears, then your file is good!

You can quickly run through your italic endings for missing punctuation. Search for </i>. Remember if the full sentence or phrase is italic the .!? goes inside the markup.


Time for jeebies! Go to Fixup->Run jeebies. This will give you a list, hopefully not too long, of every "he" that maybe should be "be" and vice versa. Almost all of these will be false positives but it's better to check than to have your PPVer or WWer point it out to you later. Double-click on the first question and read the text that shows up in the window, if it's right, right-click and it will be removed from the list of questions and move on to the next one. If it's wrong, check the original and fix it and note it if the original was wrong too. Then right-click and on to the next one.

Word Frequency

This is one of my favorite steps. Go to ->Fixup->Run Word Frequency Routine. It automatically runs on Frq which I find not as helpful as Alphabetically, so I check "Alph" immediately and run it again. Then I work my way through most of the buttons.

Emdashes: I run quickly down the list of em-dashes to check for any "suit--cases" that should be "suit-cases" and the like. Double-clicking on any of the words in the list will take me to the one that appears in the text. If I want to compare it to the original I click on the that "See Image" button on the bottom of the main GG window.

Hyphens: Here the first line will tell me any "suspects". These occur when a word shows up as both hyphenated and not in the text. Mine says 32 words with hyphens, 1 suspect (marked ****). While I am most interested in that suspect, I also check as I go down the list for words that shouldn't be hyphenated at all. Such as "in-the." (That wasn't in my text, it was just an example.) My one suspect was "tiptoed" which also shows up as "tip-toed." Once each. After searching for "tiptoe" and "tip-toe" to see if it shows up in another form, I find that those are the only two instances, so I can't get a majority ruling. I decide to leave a note in my Transcriber's Notes: Both "tiptoe" and "tip-toe" were used in this text. So far, for this text, this is my first actual transcriber's note. If it turns out to be the only one, I may not bother with the note at all as it does match the original text. If there were many inconsistencies like this, I'd leave the note.

Alpha/num: This is a very quick check to find any lone1y numbers floating amongst words as that 1 in lonely. Your list will consist mainly of 1st, 12mo, etc. which are correct.

All Words and Check Spelling are buttons that I skip. We'll be running the more thorough Spellcheck later.

Ital/Bold: This button will list all italic or bold words just as it says. Check those for punctuation that is in that should be out. We'll run a check for out that should be in, later.

ALL CAPS: Another button that I skip.

Mixed Case: Here is where we check that all MacArthurs ended up as MacArthurs and not Macarthurs.

Initial Caps: I don't use this button now, but I may use it later if I find some anomalies in proper names. For example, one of my books used Molly most of the time but Mollie a few times. Those few Mollies I changed and left Transcriber's Notes about.

Character Counts: This will show you which characters are there. There should be the same number of [ as ], but we checked that already with Orphaned Brackets. At the very bottom of the list are the very odd characters. The British pound sign sometimes shows up here and a double-click will often show it was in place of an "f." "o£" This time though, Surepop is clean.

Check , Upper: This checks for every comma that is followed by an upper case letter. Now most of them will be correct due to proper names. Keep an eye out for , The; , What; , There; etc. This check will show up again in Stealtho checking. So you can do it thoroughly now, or thoroughly then. On this one, I paused over ", Your" as that seemed odd. Double clicking on it took me to the place in the text and it showed: "'Oh, Your Majesty, let" which is of course, fine.

Check . Lower: This will show you every time a period/full-stop is followed by a lowercase letter. This will show up most often for things like, etc. A.M. gym. i.e. and so on. Every now and again though, it will show up in place of a comma, which will need fixing. If it is that way in the original, then fix with a TN, if it is correct in the original text, then just fix it and move on merrily. This book hasn't got a single one.

Check accents: This is a great check. It takes every accent and then lists any words that show up similarly. For example: resumé and resume and even resumè; coördinate and coordinate. On words like coördinate, it will not find co-ordinate. You'll want to check for that yourself. Again, being an easy kiddie-lit book, no accents to be found.

Unicode > FF: This is a button I've never used. No real reason except that it doesn't ever find anything for me.

Stealtho Check: Skip this button as you're going to do a more thorough check now.

Close the Word Frequency window.

Stealth Scanno

In the upper tool bar, go to Search->Stealth Scannos. A window will pop up with three files.


For no really good reason, I tend to go with Regex first. Either double-click regex or click and choose Open. The search and replace window will open. Grab it by the title bar and move it down so you can still see your GG window above. You'll see that the regex box is already checked for you.

At the bottom of the Search & Replace, check the box that says "Auto Advance." This will make it skip checks that are not present in your text. It's very clever. It is now going to run through a list of regexes (computer terminology for code that looks for certain things like two spaces after a . instead of one) If you click Search and nothing happens, click it again to be sure and then you can rest assured that that item is not in your text. The more complicated your text is, the more checks the regex will pause on. Click "Next Stealtho." For me it stops on "R. W." because it used to be a DP rule that all initials had the spaces removed. Now the rule is match the scan so we can leave it and move on. Sometimes this check is very handy for checking for consistency between A. M. and A.M., etc. In this book, that's not a problem.

This is where your more thorough check for , Upper can take place.

And so the checks go on. For me, it paused on "heartstrings" because there were five consonants in a row without a vowel and it thought that might be a problem; it wasn't, so I clicked "Search" again and it found nothing else. On to "Next Stealtho."

Next it highlighted each repeated word (dittograph, if it's an error):
He had had cherry pie for dinner.
He went to the the store.
I see that they are all correct and move on.

Save. Often. With different names. This phrase will show up a lot. Once you're completely comfortable with PPing, you'll save different versions less often.

It paused on a line that was over 75 characters long. We can ignore that check right now. It will be fixed up later.

Then it will search for combinations of letters that it thinks might be questionable. "tli" that possibly could have been "th" in the original. Since it is in the middle of "whistling," I'm safe to ignore. I skip most of the letter combination checks because I'm going to do a thorough spellcheck later.

It will also point anything in brackets out for you. This will help spot any malformed proofer's notes, the [oe] and [OE] markup that you'll handle later, (add this to your list of notes to yourself if it finds one), and any footnote markers.

At the end of the regex stealtho check, the Search and Replace pop-up will have 36/36 at the bottom, meaning that all 36 checks have been done.


Go back to Search->Stealth Scannos and choose "en-comm". I skip the "misspelled" as, again, I'll be doing a thorough spellcheck soon. En-comm checking will list stealth words that could be other words. It will usually list the uncommon word first. Click the "Whole Word" box so that you can skip checking every single date in your book for the 1 check. You are looking for 1s that should be Is. All of my ones were correct. They were parts of lists and prices and the first page in the table of contents.

Next it stops on "bad" which could be "had." Click through them checking. If you are unsure, you can always click "See Image" at the bottom of the GG window. A new button has appeared on the search and replaced named "Swap Terms." This button will do just what it says, stop looking for the "bad" and look for all of the "had." I am not going to check the 191 instances of "had." If a proofer had noted many had/bad errors or I'd seen any, then I would. Sometimes you have to make the call about what is the best use of your time and what will give you the best end result.

I do check all of the instances of "ball" and "hall". Sometimes these checks show up not only stealthos but inconsistencies in the text. It has on occasion found instances where every other hall-way is hyphenated, but there is a hall way lurking.

Keep clicking through the stealthos. We've already checked the "be/he" with jeebies so that is one giant check we can skip. Again, I skip checking the 1636 instances of "the."


Always make sure that you've run Word Frequency before running spellcheck. Doing this will make your spellchecking much faster. As long as you've run it since opening GG, you're good to go. You can rerun it at any time. Go to Search->Spellcheck.

I drag the window to the far right of my screen and resize it by clicking and dragging on the left side of the Spellcheck window so that the "resume at bookmark" button is completely invisible. I won't be using it and it gives me more of the GG window to see. This book says I have only 66 words to check. You will probably not be so fortunate. The first word it questions for me is "Pigmy." It only occurs one time in the text (with mixed case, anyway) so I click on "See Image" to check it. It is correct, so I click the bottom right button "Add to project dictionary." Then it will hop to the next word automatically. The image number at the bottom of the GG window will help me with this checking. I just go to the already open xnview image. I click "home" on my keyboard if I'm not already at the start of the book. Then I use "page down" on my keyboard to get to the right page. This helps so that I don't have 13 xnview windows open at one time. At the top of your Spellcheck window it will tell you how many times a word appears in the text. If it says it appears "0" times, it means that it is part of a hyphenated word. If a word appears 13 times, you are usually safe to add it to the dictionary without checking each one. With some experience, you'll find your own threshold of how many occurrences are enough to justify adding without checking each one. You may not find many misspelled words, but you will probably discover some inconsistencies. I just found that "Pellmell" and "pell-mell" show up once each. Since neither was over the end of a line, I have no way of knowing which the author preferred. I add a note to the end after the "tip-toe" note: This text also uses Pellmell and pell-mell. Some PPers do not bother with this type of note. I do because I know that the errata team at PG gets a LOT of false positive notes questioning their texts. This may stop at least one of those notes appearing.


Spellchecking done, now for the first Gutcheck. Save your file. Go to Fixup->Run Gutcheck. A new window will pop-up. Don't even worry about what it says yet. First thing, click the GC View Options button. Another new window will pop open. Now you check what you don't want to see right now:

  • Asterisk (As that will show you every /**/);
  • Forward slash; (that will show you all html tags and all of the wrap/don't wrap markers)
  • HTML symbol (we know those are there);
  • HTML tag (again);
  • Long line (will be handled later, I promise)
  • Short line (same as above)

Close that window and go back to the Gutcheck window. Pull it down so that you have the GG window on top and Gutcheck on the bottom. You may need to resize. Scroll down past the line that will say something like -->181 queries. The thing to remember is that just because Gutcheck questions it, doesn't mean it is wrong. It is just checking. You have the final say. My first stop is for <sc>. As Gutcheck doesn't recognize this markup, it will question every use of it. It will usually say: Paragraph starts with lowercase; Query word sc; Query word sc; Feel free to right-click these and move on. Double-clicking on the item in the list will take you to the place it occurs. Right-clicking will remove it from the list. The second thing my Gutcheck asks about is "No punctuation at paragraph end." This is because my book's title is split into two lines. As it isn't really a paragraph, I can just right-click and move on. This will take me to the next item automatically. Check any questions against the image and then right click them.

If any of your quoted parts go over one paragraph, Gutcheck will question with "Mismatched quotes." Just check to be sure they are correct and delete it. On this project, it did find one set that the proofers had missed the opening quote on. After checking the original, I replaced it. Had it not been in the original text, I'd have added a note to my TNs concerning it. If there are a lot of printing punctuation errors, I will just include a blanket note: All punctuation errors repaired. If there are only one or two changes to make, I'll just list each one separately:

Page 33, opening quotation mark added. ("For today it is)

The words in the parenthesis allow the reader to search for that exact text if they wish to as in our text versions, the page number will only give them a general idea of where it is since we don't retain those in the text.

Save. Often. With different names. This phrase will show up a lot. Once you're completely comfortable with PPing, you'll save different versions less often.

Split the Files

Split the files. Now I go to File->Save as. Save as surepop.html. Then do it again. File->Save as. Save as surepop.txt. Now I have a separate file with all of the fixes set for htmling. Right this minute though, we're going to tidy up the text file which is almost done!

Text Only File

Then I go to the very bottom of the file, add a couple of blank lines and then go to: Text processing->Add a thoughtbreak. I add a couple of blank lines after that and then copy and paste my list of TN there.

Converting Tags

Go to Text processing->Convert italics. All of your <i> and </i> have now been changed to _. Do the same for Convert bold, just in case you have any. These will be changed to =.

For <sc>, you'll need a regex. Go to ->Search->Search and Replace. Check the regex box.

Put this in the search: <sc>((.|\n)+?)</sc>
In the replace: \U$1\E

Click replace all.

This magical code will capitalize all of those and remove the tags.

For [oe], do a search and replace to change it to just oe; for [OE], change them all to Oe. Make sure regex is NOT checked on your search and replace. If your project has a LOT of these, you may want to consider making a UTF-8 version in addition to your plain text version that actually uses the real ligatures, œ and Œ.An example is if your main character is named Phœbe.

Table of Contents

I tidy up the table of contents before rewrapping. Align the numbers and chapter headers so that it looks nice and clean. Rewrapping will add about 8 spaces before each line. If that will make your lines too long (i.e. longer than say 72, you may want to do this by hand) Line everything up the way you want it to be, then highlight the table of contents by clicking and dragging over the whole thing. Then go to Selection->Indent selection 1. Twice. That will indent the whole selection 2 spaces. PG asks that things that are indented be so at least two spaces. You can do it more if you like, just watch that right margin. If you place your cursor after a number in the table of contents, the first box on the bottom of the GG window will tell you how far to the right you are. Mine ends as Col. 65, so I am set. Now, if you have indented your TOC yourself, immediately change the /* and */ surrounding it to /$ $/. This tells GG to just ignore that bit when rewrapping.


Hit ctrl & a. This will highlight your entire file. Then go to Selection->Rewrap selection.

Checking /* indentation after Rewrap

Now I quickly run through all /*. If you've chosen a poetry book for your first book, this will take a long time. Otherwise, I just check them to be sure everything looks as I'd like it to. Surepop has little quotations from "Colonel Sure Pop" with his "signature" following each one. I'd like those to be sort of right-aligned as they are in the original. So I just space them over. This check also allows me to check for any poetry that has gone over too far to the right. Gutcheck will find this for us soon when we run our last Gutcheck so it's not important to catch all of those right now, but if I see them, I fix them.

Removing the Markers

Go to Fixup->Clean up rewrap markers. This will delete all /# /* and /$ markers and their mates.

Remove End-of-Line Spaces

Now go to Fixup->Remove End-of-line spaces. Then save. If you do not do this now, Gutcheck will complain about every one.

Final Gutcheck

Final Gutcheck. Go to Fixup->Run Gutcheck. Remember those boxes we checked? Uncheck them now as we need to know if there are any lingering long or short lines, etc. Just like last time, make your way through the questions, right-clicking correct things, fixing anything that is broken.

Remove End-of-Line Spaces Again

Again go to Fixup->Remove End-of-line spaces just in case one got added during fixing things. Your text version is now DONE.


Deep breath. On to HTML.

Go to File->the second thing on your list should be surepop.html. Click on that.

Setting up the Table of Contents

Go to your table of contents. If any line begins with a space, remove it. You'll be happier later. This will often occur if your formatters have aligned your chapter numbers for the text for you. Very helpful for the text. Not as helpful for the HTML. Sometimes there will be a "PAGE" designation on the right above the number. remove the spaces before it to make it land on the left margin. Now here is where I do something wacky that helps me later. I type "Blah" right before "PAGE" and then add some spaces after it. So:


now looks like

Blah            Page

This will help me in making the table shortly. "Blah" is my space holder and will be replaced with &nbsp; when I actually make the table. Putting &nbsp; in right now would be a Bad Idea as GG will convert the & into code. Before finishing the file completely, I can run a check for "Blah" and make sure none got left behind. Make sure that each page number is spaced at least 2 spaces from the title. DP requires six but GG only needs two.

Auto-generating the HTML

Here we go. Go to Fixup->HTML Fixup. That big pop-up with all of the cool buttons returns. In the upper left click the button that says Autogenerate HTML. It will immediately save a copy just in case for you. Then it will go to work. And work. And give you a file loaded with coded html. Save. (something like surepop-html-1.html) If you want you can immediately go to External->Pass open file to default handler. It will open your browser with your new file and show you how it is starting to look like a real e-book!

Fixing the Title

First thing, the ninth line of your html file will say something like:

The Project Gutenberg eBook of Sure Pop And The Safety Scouts, by Roy Rutherford Bailey.

Check this to be sure it is accurate. GG will take the first line of whatever is written on your file and the first thing that says "By" followed by something and place it in the author spot. If it cannot find a "By" it will just say "By AUTHOR." The allcaps are so that you see it and remember to fix it. If the first thing in your file is an illustration, that will be what it chose as the title. Fix it up to match the real title and author if it is not correct. As you can see, it did pretty well with Sure Pop but I need to change the case of the "And The" to be correct.

The Project Gutenberg eBook of Sure Pop and the Safety Scouts, by Roy Rutherford Bailey.

Cutting Out Unused CSS

Now there is a bunch of code following. This is your auto-CSS. (CSS just means Cascading Style Sheets but you really never need to know that.)

You should remove anything that you are not using. For example if you have no footnotes, you can remove all of that code:

   .footnotes        {border: dashed 1px;}
   .footnote         {margin-left: 10%; margin-right: 10%; font-size: 0.9em;}
   .footnote .label  {position: absolute; right: 84%; text-align: right;}
   .fnanchor         {vertical-align: super; font-size: .8em; text-decoration: none;}

You probably do not have line numbers or sidenotes either. If you are unsure if something is in your book, then leave that bit there. A little bit of unused CSS is okay. The more experience you have, the more you will change your CSS to fit what you like and what works best for you. For example, I have indented paragraphs, a bit for right-aligned text, a bit for signatures, my own poetry mark-up, etc.[2]

If you have an older version of GG, then
/*<![CDATA[ XML blockout */
may appear at the top of your CSS and
/* // --> */
/* XML end ]]> */
at the bottom of your text. Delete those lines as they just bug the whitewashers no end. It is wise to open your GG file, find the text file that is named "Header" and delete those lines from it now. Then, from now on, those lines will never appear again.

Page Numbers

I remove all of the Pg from the page numbers. I think the numbers speak for themselves and sometimes they end up split across two lines and that just looks odd. So Search and Replace for: Pg . (That is "Pg" with a space after it) Leave the Replace box empty. Save. At any time that you've made a change to your HTML and saved it, you can refresh that browser that showed you what it looked like. Then you can check that what you did turned out the way you wanted it to.

From here on, you can do each of these steps in almost any order. Sometimes putting illustrations in first will be the wisest course, sometimes deleting the auto-TOC will make sense to do first. Save versions as you go. Undo what you don't like.

Unused Chapter Links

Getting rid of chapter links that you won't use. GG automatically inserts links for its automatic Table of Contents. I make my own more tidy Table of Contents, so we do not need those links. Search and Replace them away. Check the regex box.
Search: <h2><a name="([\w\s\p{IsPunct}\n]+?)" id="([\w\s\p{IsPunct}\n]+?)"></a>
Replace: <h2>

Replace all.

Delete the Auto-generated TOC

Now scroll down in your file until you find GG autogenterated Table of Contents. It will start something like:

<!-- Autogenerated TOC. Modify or delete as required. -->
<a href="#Sure_Pop_and_the_Safety_Scouts"><b>Sure Pop and the Safety Scouts</b></a><br />
<a href="#SURE_POP_AND"><b>SURE POP AND</b></a><br />
<a href="#CONTENTS"><b>CONTENTS</b></a><br />
<a href="#INTRODUCTION"><b>INTRODUCTION</b></a><br />
and end with something like:
<a href="#THE_BEST_OF_GIFTS_A_BOOK"><b>THE BEST OF GIFTS—A BOOK</b></a><br />
<!-- End Autogenerated TOC. -->

and DELETE THE WHOLE THING. You won't use it. You'll make a much nicer one later on.

Title Page

Make sure that the title is in <h1></h1> tags and the author in <h2></h2> tags. Anything that you want centered can go in <div class='center'>Whatever it is</div>. If you need line breaks or blank lines, add <br /> at the end of a line. Just make sure that every <br /> is contained in a <div> of some kind or a <p> or a heading of some kind.

It is probably wise to center the copyright as well. It just looks nice.

Table of Contents

Make sure that each line is left justified and that there are at least two spaces between each title and page number. I usually just put in 5 or 6 to be sure. If your TOC has Roman chapter numbers, and most do, put some spaces between the . after the number and the chapter title. I do a search for . with a space after it and replace with . with a few spaces after it. If you have a title with Mr. Whosit, don't do this search and replace unless you want to just go back and fix that one by hand. If you have a "Blah" place holder, replace that now with &nbsp; Okay, highlight the whole Table of Contents. On the HTML-fixup menu, click "Auto Table".

The second line will say: <table border="0" cellpadding="4" cellspacing="0" summary="">. You'll want to make that cellpadding number smaller. I change mine to 0 and add some spaces by hand, but that is too tight for some people. In the summary="", put Contents or Table of Contents.

If your TOC has a "CHAPTER" heading above the chapter numbers, change
<td align='left'>CHAPTER</td>
<td align='left' colspan='2'>CHAPTER</td>

This will make your chapter heading overlap the chapter title column a bit, making things a little tidier.

If your TOC has a "PAGE" heading, you'll want to change
<td align='left'>PAGE</td>
<td align='right'>PAGE</td>

You'll want those Roman Numerals to be right-aligned if they are in the original. UNCHECK regex. Highlight the whole table.

Search for: <tr><td align='left'>
Replace with: <tr><td align='right'>

To put a space after each Roman Numeral, highlight the whole table, then

Search for: .</td><td align='left'>
Replace with: .&nbsp;</td><td align='left'>

Now we want the TOC to be linked to the actual page numbers. Check the regex box on your Search and Replace. Highlight the whole table.

Search for: 'left'>(\d+)
Replace with: 'right'><a href="#Page_$1">$1</a>

This will both right-align the numbers and link them.

Keep refreshing the browser window that has your completed project showing. Make sure that you are liking what you see. Make sure you save with different names (i.e. surepop-contents2.html) between refreshes to see the difference. If you don't like the change, go back to an earlier save. Eventually, you will save a fewer versions, but until you are comfortable making changes, it's best to be cautious.

Chapter Centering

GG has already centered your first line of Chapter Heading. Usually something like


It puts them in <h2> tags for you. You need to center the title yourself. Check the regex box.

Search for: <\/h2>\n\n<p>((.|\n)+?)<\/p>
Replace with: </h2>\n\n<h3>$1</h3>

Replace all.

Placing Illustrations

If your book had a cover image included, you'll need to create an image tag. Your proofers and formatters didn't know it was included, so this is your job. At the end of your CSS, after the


Type in <p>[Illustration: Cover]</p>

On your HTML fixup menu, there is a button on the upper right for Auto Illus Search. GG will search for your first <p>[Illustration]</p> tag. It will then pop up your project list and then choose your "images" folder. Find the corresponding image number. If your illustration had no caption, then in the first box on the Image window, under Alt text, type in a short description. (This will help people with readers know what they are missing.) The jury is out on whether you should also type the same thing in the "Title text" window. Accessibility people say, "No!" it just repeats information. The strictest HTMLers say: "Yes! You should never have empty ""!" Your call. Decide whether you want your illustration to be on the left, in the center or on the right and check the appropriate box. For the cover, I choose center. Click okay. Now, since my cover really doesn't have a caption in the original, I erase the line that says: <span class="caption">Cover</span>. Do this for all of your illustrations that do not have captions. You type it into the box so that the (alt="Cover") bit of the illustration code is already filled in. Repeat this process until GG cannot find another Illustration tag.

Transcriber's Notes

Transcriber's Notes for the HTML. Wherever possible I use hover tags. These are tags that pop up when you hover your cursor over them. They underline the corrected word with a dotted line so you know where they are.

In your CSS put

ins {text-decoration:none; border-bottom: thin dotted gray;} .tnote {border: dashed 1px; margin-left: 10%; margin-right: 10%;padding-bottom: .5em; padding-top: .5em; padding-left: .5em; padding-right: .5em;}

Find your first correction. This is where those words you put in parentheses for the text version in your notes comes in handy. Replace the corrected word with

<ins title="Transcriber's Note: original reads 'Molly'">Mollie</ins>

When you've replaced them all, go to the very end of your file. Put in a break like the chapter ones:

<hr style="width: 65%;" />

<div class='tnote'><h3>Transcriber's Notes:</h3> <p>Obvious punctuation errors repaired.</p>

<p>The remaining corrections made are indicated by dotted lines under the corrections. Scroll the mouse over the word and the original text will <ins title="Transcriber's Note: original reads 'apprear'">appear</ins>.</p></div>

You can change the wording as much as you like. Since Sure Pop is so short and there were no real corrections, my TN for this project looks like:

<hr style="width: 65%;" />

<div class='tnote'><h3>Transcriber's Notes:</h3> <p>Transcriber's Notes: Both "tiptoe" and "tip-toe" were used in this text. This text also uses Pellmell and pell mell.</p> </div>

Your notes: Now go back and go over the notes that you left for yourself. For example, I like my ! ? ; : to slant with the rest of the italic text so I'd left a note "Put ! into </i>"

Search for: </i>! Replace: !</i>

Replace all. Repeat for ? : ; Make sure that "Whole Word" is not checked or it won't find any. As there are no spaces, it assumes that is part of a word.

Replace any [oe] with &oelig; and [OE] with &oelig;. Make sure the regex box is NOT checked for these search and replaces.

Here is my list of notes for this Project.

  move ! into </i>
  right-align illo on png.017
  fix note on 063
  hanging indents on 084
  hanging indents on 092
  center single line on 099
  hanging indents on 135; 136

Last Pass

After taking care of your notes, it's time to go through and look your project over carefully. Refresh the one in the browser and start at the top, fixing anything that you find. If you don't know how to fix it, ask in the Help! HTML thread or the The Official "No Dumb Questions" thread for PPers

You must know by now that DPers love to answer questions. You will probably get more than one answer or solution. Keep at them until you understand and get an answer that works best for your text.

One of the things that I had to fix was the title of the book right before the first chapter. The first line of the title was in the <h2> tag as it should be but the second line was off on its own. I inserted a break <br /> after the first line instead of the closing </h2> and put the closing </h2> after the end of the second line. Save your file.

The Final Checks

Find Orphaned Markup

After fixing everything that you wanted fixed, it's time to run some checks. First: Find Orphaned Markup. This is the second time we've used this button on the middle right of the HTML Markup menu. Chances are, this time it will find more than it did last time. For me, it found an opening <p> on a page number that I'd moved without a closing </p> That was easy to fix. Keep pushing it and fixing until it takes you to the end of your file because that means it's done. Save your file.

Link Checker

This checks to be sure that everything that links in your project, like the TOC goes somewhere, and isn't broken. It also checks to be sure that you've used all of the illustrations in your "images" file. At the bottom of your HTML Markup menu on the left is a button "Link Checker." Click it and a new window opens. Anything with (CRITICAL) after its title, needs to be fixed. (INFORMATIONAL) will be full of page numbers that you didn't use, which is fine. If there are no CRITICAL issues, close that window. Save your file.

Tidy Check

Go back to HTML Markup and next to "Link Checker" push "HTML Tidy". If you are lucky that first line says:

INFO: Doctype give is "-//W3C//DTD XHTML 1.0 Strict//EN"

This means that your file is perfect as far as Tidy is concerned. If not, fix up the errors or warnings, that it finds.

Save your file.


Validator: this is a web validation service. You must run it and it works really, really well.


Browse to find the file. For me: C:\DP\surepop\surepop.html The icon for the HTML file has the blue E next to it as it thinks IE is my usual browser. I've just never changed it to Firefox which is what I use for most things. (You should check how your HTML looks in at least those two browsers.) Push Enter or Check.

Chances are the bright red:

Errors found while checking this document as XHTML 1.0 Strict!

will show up. With more practice, it will turn green most of the time. I have an error on this one. Do NOT panic with the number it gives you. For me it says: 850 Errors, 1 warning(s) I just roll my eyes. I know that I do NOT have 850 errors and neither do you. Scroll down the error page until you see

Validation Output: 850 errors.

It will then tell you what line it thinks your heinous error is in.

Line 248, Column 4: document type does not allow element "h2" here; missing one of "object", "ins", "del", "map", "button" start-tag <h2>SURE POP AND THE<br />

Now I know that is wrong. So I go to the GG window. On the bottom left, it says: Ln: 4317/4321 (meaning my cursor is currently on line 4317 of the total 4321 lines in the project. I click on that box and a little pop-up Goto Line Number. I type in 248. I know that line is fine so I start to look UP the file for something odd. Omigosh. I realise what I've done. It is almost always a very simple error and this is no exception. Remember, how I told you to save before going to Validator? I didn't. So of course I have errors. They're already fixed, but Validator didn't see it as it saw the LAST save I did before fixing errors. I save the file. I refresh the Validator window. Click okay when it tells me about the PostData. and get the happy green: This document was successfully checked as XHTML 1.0 Strict!

What this taught us, beside take your own advice, was that one error, that lost <h2> tag that was fixed by fixing the title before the first chapter, caused Validator to panic and tell me that everything past that point was Wrong, Wrong, Wrong. Validator does that. It errs on the side of Panic. Don't Panic with it.

Do not include the link that the Validator offers you to prove that your file is valid. The whitewashers and your PPV will check it themselves.

Zip it Up

Though it is hard to believe, you are pretty much done. It's time to zip it all up for PPV.

I use zipcentral, but its website has gone the way of the dodo. I asked in this thread about ways other people zip. There are a lot of options. The Winzip one is probably the easiest for now.

In your zip you should have, named for your project, of course:

   * Your text file: surepop.txt
   * Your html file: surepop.html
   * Your txt bin file: surepop.txt.bin
   * Your html bin file: surepop.html.bin
   * A folder with your images: images

Once your project is zipped up, check it for a file called thumbs.db. Delete that file immediately. It is something helpful that IE adds that is huge and useless for our purposes. It is a FAIL for whitewashing (the final step in Post-processing where our projects have to pass someone at PG). I have turned off the option in my IE and have never missed it. If you want to as well, go to your IE Browser. Go to Tools->Folder Options. Go to the View tab. Scroll down and check the box titled: Do not cache thumbnails. Click Apply and close the window. Now it will never include them again.


Go to your project's Project Page. Scroll to the very bottom and you'll find a button: Upload for Verification. Browse for your zipped file. Type any explanations or comments to your prospective PPVer in that box. Things you might include are page numbering inconsistencies you noticed, or that the author changed your character's name halfway through and how you handled it. Anything that a quick word from you would explain, explain here. Click upload. If this is your first project, go post in the "New PP-ers waiting for their first project to PPV" thread.

You are done! For now.

Your PPV wants you to succeed. They want you to get to see your project on Gutenberg. They will:
1) Download your project.
2) Check it over with a fine tooth comb.
3) Spellcheck it.
4) Run all of the checks that you did (or were supposed to)
5) Take notes and either
   a) Send it back to you to fix some things, or
   b) Write to you and ask what you'd like them to do about some things and then post to Gutenberg, or
   c) Post it to Gutenberg and send you feedback. This one is much more likely after you've done a few.

You will receive feedback no matter which of the three above they choose. It will be encouraging and, even so, you'll probably feel frustrated and embarrassed if there are changes to be made. This is normal, but not what they are aiming for. Be patient with the time it takes and with their suggestions and corrections. If you don't like what they've changed, tell them. Remember that they, like you, are volunteers. They've volunteered to take on this job and want you to feel good and do well!

Further Notes

[1]<sc> If your book uses a.d., b.c., a.m., p.m. and so on, you'll need to fix that for the HTML.

Other searches: If your book uses period/full stops after Mr., Dr., Mrs. Mme. Mlle., and so on. It's sometimes wise to check for them with a space after for missing periods.

[2] If you find yourself adding the same thing over and over to your CSS files, such as right-aligned text, you can add it to GG header file so that it always appears. To edit the file, go to your GG folder. Open the text file that named obviously enough, "Header." I immediately save it as "GG original header." Then go back and reopen "Header." Somewhere in the list of ".somethings" add:

   .right    {text-align: right;}

For example, mine reads is right above

   .u        {text-decoration: underline;} 

GG doesn't care what order they are in. Then save your file and thenceforth, all of your autogenerated files will have the right-aligned CSS code in place.

Extra Information that you probably won't see on an Easy Fiction


I do footnotes twice. Once for the text version and once for the HTML version. This is because I put them in different places. For this walkthrough I'll be using "Roses and Rose Growing" as "Surepop" had no footnotes.

Text Version Footnotes

After the text has been saved as roses.txt and before rewrapping:

Go to->Fixup->Footnote Fixup

The buttons on the menu:

  • See Anchor: That takes me to the place in the text with the [1] for whatever number is in that drop down menu in the second line.
  • See Footnote: Takes me to the footnote.
  • Last Footnote: Takes me to the footnote right before the one I am looking at presently. If I am looking at #1 it won't show me a thing, obviously.
  • Drop Down: Lets me choose any of my footnotes to see.
  • Next Footnote: Shows you the next footnote in order. Use this to step through your footnotes and make sure they are all the way you'd like them to be.
  • Three check boxes: This lets you choose to have only one type of footnote marker. You will use this most often. Sometimes however, your text will have both numbered and lettered or numbered and Roman. I've already found on working on "Roses" that it has both numbered and lettered so I'll be leaving all of the boxes unchecked.
  • Number, Letter, Roman: This button changes whatever footnote you are looking at to a number, a letter or a Roman numeral. We will not be using these buttons.
  • Sadly, I cannot tell you what Join with Previous, Adjust Bounds, or Set Anchor, does as I've never used them.
  • First Pass: This button runs through your document and counts up and checks all of your footnotes. Although it says "First", you'll use it more often than just once usually.
  • Inline/Out-of-line: Only on the rarest of occasions would you use Inline. It gets in the way of reading and is generally frowned upon. In all of my books, I've used it once and am not sure even then if it was the best choice. It was an ancient history book with numerous year footnotes. Out-of-line is the default and should remain checked.
  • Re Index: This will change all of your footnotes to an order. 1, 2, 3. A, B, C. I, II, III. IF you didn't check "All to" above then it will simply change them in whatever form you have. Mine will be both 1, 2, 3. and A.
  • Autoset buttons: This will tell GG where you want your Footnotes to land. For the text version, the usual is Chapter End. For the HTML version, the usual is End. It will add a heading of FOOTNOTES: to the places you've chosen to have footnotes.
  • Next and Last Landing Zone: Does what you'd think. Lets you scroll through your landing zones.
  • Unlimited Anchor Search: this should be check at all times. It lets GG search for anchors that are farther away from the actual footnote than you might expect.
  • Move Footnotes to Landing Zones: This does what it says. It collects all of your footnotes and puts them where you've told it to. The text on this button will be gray until you choose a landing zone.
  • Tidy Up Footnotes: This is the final step for the Text version. You will NEVER use this button for the HTML version. It changes

[Footnote 1: See pruning, p. 17.] to

[1] See pruning, p. 17.

It tidies them up.

  • Check Footnotes: This is another button that you may use a lot. After First Pass, click this button and a window will open that lists all of your footnotes and any issues that might happen with them. They will be color-coded. White is what you want to see.

The actual process:

Click: First Pass. Click that and watch GG zip through your file counting up and checking footnotes. At the top of the window I now have #1/16 which tells me I have 16 footnotes and am looking at the first one.

Click: Check footnotes: All white. Sometimes on making your first pass through the text and moving the footnotes out of the middle of paragraphs will result in the accidental placing a footnote BEFORE an anchor instead of after it. Simply go to the footnote in question and move it. Click First Pass again. Check Footnotes. It should now be resolved. Sometimes a page will stop mid-paragraph with a [1] in it and continue on the next page with another [1] on the new page. Now you have two [1]s in the same paragraph. Change the second anchor to a 2 and the corresponding footnote to a 2 as well. (If you have a lot of footnotes you may have a lot of reordering to do). Once all of your footnotes are white (or brown if they are just large), on to:

Click: Re Index. Before pushing this button, decide if they can all be numbers, letters or Roman. If so, check the appropriate box above. Mine cannot. I notice that my one letter footnote has multiple anchors and one note.

 [A]Blanc double de Coubert.
 [A]Conrad F. Meyer.
 [A]Madame Georges Bruant.

All for

 [Footnote A: Perpetual flowering.]

For my text version, this is fine. For my HTML, it will take special handling.

Okay, it has now been re-indexed. I'm choosing end of chapter as that way it isn't a long way for the text reader to have to go to find the reference. Autoset Chap. LZ. Now the text on the Move Footnotes to Landing Zones is no longer grayed out. Click that next.

Finally, click Tidy Up Footnotes. That looks nicer. The final step I take here is purely cosmetic. I search for FOOTNOTES: and if a section has only one, I change it to FOOTNOTE: It just seems more correct. Three of my eight chapters with footnotes had only one.

Note on Tables: Often you will want to leave footnotes that reference table data with the table instead of moving them to a chapter or book end. You can follow all of the steps and just cut and past them back. My one letter footnote conveniently landed at the end of the chapter in any case so I didn't have to move it at all.

Now it is safe to rewrap.

Footnotes for HTML

The steps are identical except: Choose End as your landing zone and do NOT Tidy Footnotes. Make sure before moving your notes to the end that there is at least one blank line at the end of your text. Otherwise it will come out something like


I try to make sure there are two blank lines above the FOOTNOTES: tag.

Other things to be aware of for html. Sometimes, GG puts the closing </div> one footnote too early. Check your second to the last footnote. If it has </div></div>, move one of them so that your LAST footnote ends </div></div>.

Now this moved my [A] footnote to the end as well. After I generate the HTML, I'll be moving it back to the end of the list it references.

If you have a multiple anchored footnote as my [A] is GG will only link the last anchor. You'll need to do the rest by hand.

Simply copy: <a href="#Footnote_A_3" class="fnanchor">[A]</a>

and paste it for each [A].

Do not include the: <a name="FNanchor_A_3" id="FNanchor_A_3"></a>

on every one. Each named anchor must be unique, therefore, each anchor may only be "named" one time in a file. You may have multiple anchors to the same footnote, but only the first one has the "name" and "id".


On your first pass, make sure that any continued items are rejoined. For example:

Amateur's Rose Guide, 2, 7, 21-3, 30, 35-6, 60-1,
72, 82, 110-12, 123.

Arsenate of lead, 146-9.

should become:

Amateur's Rose Guide, 2, 7, 21-3, 30, 35-6, 60-1, 72, 82, 110-12, 123.

Arsenate of lead, 146-9.

Or if you want to tidy it up now:

Amateur's Rose Guide, 2, 7, 21-3, 30, 35-6, 60-1, 72, 82, 110-12, 123.

Arsenate of lead, 146-9.

Make sure when you are removing page separators that anything that needs to be indented stays indented. Often you will find a lot of printer's errors in the index. Often the person who makes the index is not the author of the book. A lot of transcriber's notes come out of indexes.


For the text version, you simply need to be sure that the entire thing is indented at least two spaces from the left. After the opening /* put [2]


Abol syringe, 138, 148.

Abol, White's Superior, 141, 148.

  _See_ Green Fly.

Aphis Lion, 140.

Arsenate of lead, 146-9.
GG reads that [2] and will indent that section 2 spaces. If you have some very long lists of numbers

Amateur's Rose Guide, 2, 7, 21-3, 30, 35-6, 60-1, 72, 82, 110-12, 123.

You'll need to break those within your width limit (I usually choose 70) and indent the next line at least 6 spaces. Remember that your index will be indented two so break the line at least before 68. The first gray box in the second to the bottom line of GG will tell you which Line and Column your cursor is resting on. Column means how far over.

Ln: 5869/6294 - Col: 74

Amateur's Rose Guide, 2, 7, 21-3, 30, 35-6, 60-1, 72,
          82, 110-12, 123.

Your final Gutcheck will alert you to any long lines left over. Adjust and click Fixup->Remove End of Line Spaces when you are done. Other than that, for the text, you're finished.


Indexes for HTML are a bit more complicated which is why they are not recommended for new post-processors. You can leave the [2] after the opening /* if you like. It will just be ignored for the html version.

After auto-generating your HTML your index will look something like:

A.<br />
<br />
Abol syringe, 138, 148.<br />
<br />
Abol, White's Superior, 141, 148.<br />
<br />
Aphis.<br />
<span style="margin-left: 1em;"><i>See</i> Green Fly.</span><br />
<br />
Aphis Lion, 140.<br />
<br />
Arsenate of lead, 146-9.<br />

Put a bookmark at the top of the index so that you can find it over and over again. To place a bookmark, hold down ctrl and shift and then choose any number from 1-5. I choose 5 as it's the end of the book. To go to a bookmark hold down ctrl and click the number.

Change the opening <p> to a <div> and the closing one as well.

Linking pages: Highlight the entire index. If there are a lot of ads after the index I add a lot of blank lines before them so that I can see where to quit click, drag, highlighting easier. Extra blank lines will not show up in the final product so it's safe to add them now. After highlighting, we'll do a regex search and replace:

Search: , (\d+)
Replace: , <a href="#Page_$1">$1</a>

The comma is very important. Without it you will link every number in your index and that will have some bad unintended consequences. As will all of the steps: Save multiple versions as you go along so you can always go back a step. roses-index1.html; to start. roses-index2.html after your first pass replacing and so on. Check your file in a browser often such as IE or Firefox to be sure it's looking as you hope.

This replace may have to be run more than once. Sometimes GG doesn't see the every number in a line. Keep highlighting and running it until it doesn't find any more. It will change them to:

Abol syringe, <a href="#Page_138">138</a>, <a href="#Page_148">148</a>.

Roses took two passes through to catch all the basic links. Now, if you are lucky, your index with groups of successive pages will look like 236-247. The first will have been linked already to

<a href="#Page_236">236</a>-247

For those we simply change our regex a bit:

Search: -(\d+)
Replace: -<a href="#Page_$1">$1</a>

Repeat as above.

For Roses, however, our printer was very concerned about over-using ink. He shrank most of the continued references to:

Arsenate of lead, 146-9

I obviously do not want my readers to go to page nine to see the end of the Arsenate of lead reference so every one of those will be coded by hand.

I will use the same search and replace as above but instead of choosing: Replace all, I will look at each one, click "Replace" and edit the:

-<a href="#Page_9">9</a>

that it gives me to:

-<a href="#Page_149">9</a>

so the reader still sees -9 but goes to page 149.

Whew. So that is the number references done. Now for the "See", "See also", etc. This will work for those references within your text as well. Some do not link these and it is not required. I just think it looks nicer. (For the ones in the text that say "See page 212" simply

search for: page (\d+) replace: page <a href="#Page_$1">$1</a>

Make absolutely sure as you replace those that it is not referencing a different book entirely. See Elston's Music Dictionary, page 212 is not a link we want hyperlinked.)

Within the index, you'll notice that my first "see" entry is:

Aphis.<br />
<span style="margin-left: 1em;"><i>See</i> Green Fly.</span><br />

I put a bookmark right here. Then I go find the entry for Green Fly. I highlight "Green fly" under G and using my HTML fixup menu, I click "Named Anchor." It is right in the middle of that pop-up menu. It will add:

<a name="Green_fly" id="Green_fly"></a>Green fly,

Now I use my bookmark to go back to the "See" entry. I highlight "Green Fly" there. Using the same HTML fixup menu, I choose "Internal link". It is two to the left of "Named Anchor". A new window pops up and the first link it offers me is "#Green_fly". I double-click on that and my link is created.

<span style="margin-left: 1em;"><i>See</i> <a href="#Green_fly">Green Fly</a>.</span>

Repeat until all the "See"s are done. Look over your file in a browser. Make sure all numbers are linked. Make sure no years got linked.

Finish your HTML file as noted above.