Post-Processing Verification Guidelines
Last Edit: 30 August 2015
General Procedures
- Make all the checks you would make on a project you are post-processing.
- Keep track of all errors and inconsistencies remaining in the project as well as any corrections made.
- Send the PPer a feedback note describing the required changes, suggesting amendments, and providing helpful hints for future work.
- When the agreed-upon changes have been made and checked, upload the project to PG.
- Submit a PPV Summary. Generally, PPers should be copied on the PPV Summary.
Requests to Revise and Resubmit
Please strongly encourage PPers to make corrections and amendments themselves. (Of course, if PPers specifically request that you make the changes for them, you may abide by their wishes.)
- Use the option on the project page to return the project to the PPer's queue.
- Send them an email or private message letting them know that the project will show up in their queue again. The message should note the items to be corrected, and/or make suggestions that may improve the project (specifying that these are not errors if that is the case).
- When the PPer has completed revisions, the button to upload to PPV changes to "Return to your current PPVer for further checking". When he or she uploads the revisions, the project will automatically go back into your PPV queue.
Note: Sometimes a project that has been returned to a PPer by a PPVer becomes a reclaimed project. In this case, the squirrel doing the reclaims may contact the PPVer and ask if they would like to complete the project based on their original review. This is entirely at the PPVer's discretion and if they don't wish to complete the project it will be checked and returned to the PP pool in the usual way.
Specific Checks to Make
Please check for the following types of errors, using tools you usually use for PPing:
Errors
Errors such as failure to grasp the italics guidelines are counted as one error, not one error each time italics are wrongly handled. Errors such as he/be errors are each counted as individual errors (i.e., 3 "he" instead of "be" count as 3 errors).
If the PPer is asked to resubmit a corrected file, then any errors not corrected or new errors introduced are added to the total number of errors for rating purposes.
Level 1 (minor errors) - All Versions
- Spellcheck/scanno errors
- Gutcheck-type errors, e.g., punctuation, hyphen/emdash, missing/extra space, line length, illegal characters, etc.
- Jeebies errors (English only)
- Paragraph breaks missing or incorrectly added
- A few occurrences of hyphenated/non-hyphenated, spelling and punctuation variants and other inconsistencies not addressed (may be addressed by note in the TN)
- Chapter and other headings inconsistently spaced, aligned, capitalized or punctuated
- Formatting inconsistencies (e.g., in margins, blanks lines etc.)
- Other minor errors (such as a minor rewrap error, misplaced entry in the TN, or minor inconsistency between the text and HTML versions)
Level 1 Errors - HTML Version Only
Images
- Unused files in images folder (other than Thumbs.db)
- Appropriate image size not used for inline and linked-to images. Image sizes should not normally exceed the limits described here, but exceptions may be made if warranted by the type of image or book (provided the PPer explains the exception).
- Images with major blemishes, uncorrected rotation/distortion or without appropriate cropping.
- Failure to enter image size appropriately via HTML attribute or CSS such that the image is distorted in HTML, epub or mobi.
- Failure to use appropriate "alt" tags for images that have no caption or to include empty "alt" tags for purely decorative images or if captions exist that would make an "alt" redundant.
HTML Code
- Use of px sizing units for items other than images and borders
- <title> missing or incorrectly worded (Should be <title>Alice's Adventures in Wonderland | Project Gutenberg</title>).
- Use of <pre> tags instead of their CSS equivalents
- Failure to place <html>, <body>, <head>, </head ></body>, and </html> tags each on its own line and correctly use them. (This is required by the WWers)
- Use of tables for things that are not tables
- Used CSS other than CSS 2.1 or below (except for the dropcap "transparent" element and CSS 3 code that has been marked as "completed work," and has "REC" (for Recommendation) status according to the W3 specifications.
- Used HTML version other than XHTML 1.0 Strict or 1.1 or HTML5 according to DP guidelines.
- Failure to use <div class="chapter"> or <div class= "section"> at chapter breaks to enable proper page breaks for e-readers (Please see here for more details). It is also acceptable to use <div class="chapter"> < /div> or <div class="section"> </div>
- Minor HTML errors in code that do not generate an HTML validation alert (such as misspelling a language code)
Level 2 (major errors) - All Versions
- Markup not handled (e.g. blockquotes, poetry indentation, or widespread failure to mark italics)
- Poetry indentation does not match original
- Footnotes/footnote markers missing or incorrectly placed
- Printers' errors not addressed
- Missing page(s) or substantial sections of missing text
- Substantial rewrapping errors, e.g., poetry has been rewrapped or text version generally rewrapped so that it doesn't exceed 75 characters or fall below 55 characters (though the aim should be 72 characters) except where unavoidable, e.g., some tables
- Widespread/general occurrences of hyphenated/non-hyphenated, spelling and punctuation variants and other inconsistencies not addressed (may be addressed by note in the TN)
- Other major errors that could seriously impact the readability of the book or that represent major inconsistencies between the text and the HTML versions
Level 2 Errors - HTML version only
- The W3C Markup Validation Service generates errors or warning messages (Please enter number of errors)
- The W3C CSS Validation Service generates errors or warning messages other than for the dropcap "transparent" element (Please enter number of errors). Certain errors can generate other errors that will be automatically corrected when the original errors are fixed. Therefore, to count the number of real errors, simply run the Validator and count the errors that follow the message that includes "start tag was here". That will give you the real errors to enter into the PPV Form.
- Non-working links within HTML or to images. (Either broken or link to wrong place/file)
- File and folder names not in lowercase or contain spaces, images not in "images" folder, etc.
- Cover image has not been included and/or has not been coded for e-reader use. (The cover should meet current DP guidelines.)
- Project not presentable/useable when put through eBookmaker (Please see the section below on Checking E-reader Versions)
- Heading elements used for things that are not headings and failure to use hierarchical headings for book, chapter and section headings (single h1, appropriate h2s and h3s etc.)
Strongly Recommended (not counted toward rating but please mentor the PPers to comply with these recommendations)
- Enclose entire multi-part headings within the related heading tag
- Avoid using empty tags (with   entities) or <br/> elements for vertical spacing. e.g. <p><br/><br/></p> (or with nbsps) -- <td> </td> is still acceptable though
- List Tags should be used for lists (e.g. a normal Index). For further information please read W3's List Use section
- Include all text as text, not just as images
- Keep your code line lengths reasonable
- Tables display left, right, and center justification and top and bottom align appropriately
- Tables contain <th> elements for headings
- Remove unwanted system/hidden OS files such as .DS_Store and thumbs.db file from the images folder
- E-reader version, although without major flaws, should also look as good as possible
Mildly Recommended
- Distinguish between purely decorative italics/bold/gesperrt and semantic uses of them
- Ensure that there are no unused elements in the css (other than the base HTML headings)
Printers' Errors and Transcriber's Note
Obvious printers' errors should be addressed in one, or a combination, of the following ways:
- Correct silently (in other words, change things without flagging each change within the text) and state in the Transcriber's Note that all such errors have been corrected silently. If there are changes made this way, there should be at least one Transcriber's Note to say that this has been done in the text (Level 1 Error).
- Correct all such errors and note them in Transcriber's Note
- Leave uncorrected and state in the Transcriber's Note that at all such errors were left uncorrected.
"Not addressing printers' errors" means that all, or a large percentage, of printers' errors have been left uncorrected and not noted. If just one or two have been missed, and the rest addressed, then those missed would instead be counted as the relevant type of error (spellcheck, gutcheck, etc.).
Anything that could make a reader think an error has been made in the transcription should be mentioned in the Transcriber's Note.
Checking E-reader versions
With the clear expectation from PG and the recommendation of the DPF Board that our projects should look good in e-reader versions, the PPV community have agreed that PPVers should support PPers (and check their work) in producing HTML versions that convert successfully to epub and mobi (Kindle) formats.
The Easy Epub wiki pages are intended to help with that process of checking e-reader versions and making some simple changes to improve them if necessary.
The wiki pages include Viewing - How to use ebookmaker to view your project in e-reader format, even if you don't have an e-reader.
Also, understanding the Best Practices helps us understand the coding practices that can ensure the project converts seamlessly to e-reader format.
Please use a suggested viewer to test the epub and mobi versions of the book.
It doesn't take long to look through the pages of the epub and mobi versions. Here are some problem areas to look for:
Front and End of Book
- TOC
- Title page layout
Body of Book
- Horizontal rules
- Obscured sections within the book such that text covers other text or blank areas occur where text should be
- Poetry
- Dropcaps
- If hovers were used in the HTML (which isn't recommended), all important "hovered" information should be present and readable in a non-hovered way within the e-reader version. Also Transcriber's Notes referring to hovers should be hidden in the e-reader version.
- Headings
- Blockquotes
- Page numbers (if present)
- Sidenotes
- Margins
- Tables
- Illustrations
Submitting a PPV Summary
Before sending a PPV Summary, PPVers will usually wait until the posted notification is received from PG in case the whitewasher flags something else.
Using the link on the project page, submit a PPV Summary for the project, using the following criteria for determining project difficulty and rating of PPer's work:
Determining whether a project is Easy, Average, or Difficult
The PPV Summary Form will calculate whether a project is Easy, Average, or Difficult based on the information you provide.
Some feature types have checkboxes to indicate whether that text feature (such as poetry, footnotes, etc.) appears in a Basic, Average, or Complex form. The PPVer will decide which checkbox to select, if any, for each feature, using the following guide to inform their judgement.
Basic: A small number (3 or less) of short occurrences of the feature, or less than a page in total for index, drama, etc. All have simple layout.
Average: Either a small number of more complex occurrences, or a greater quantity of simple occurrences than Basic (e.g. up to 20 occurrences, or up to 3 pages of index, drama, etc.)
Complex: Greater quantities than Average, or similar quantities but with widespread complexity.
Different feature types have different ways of showing complexity. Examples below indicate what may constitute Complex; it is not an exhaustive list:
- a footnote may contain blockquotes, tables, or links to other footnotes
- drama/poetry may have complex layout that is likely to require manual checking/adjustment beyond what an automated tool would produce
- advertisements may contain images and different-sized or differently-justified text
- tables may have different alignment of cell contents, or need special treatment such as splitting or rotating due to width
- illustrations may need advanced preparation, or non-standard (left/right/center) placement
- indexes may have multiple levels of entries, or links to other index entries
There are also some features which are always considered Complex, such as multiple languages, or music.
The PPV Summary Form will then count how many checkboxes of each type have been selected to determine the PPing difficulty:
Difficult Project: Includes 3 or more Complex level features
Average Project: Includes 3 or more Average level features -OR- 1 or 2 Complex level features
Easy Project: Includes less than 3 Average level features and no Complex level features
How to define multiple languages:
- If the book is English on one page and Latin on the facing page, it counts as multiple languages.
- If the author is traveling and repeatedly reports conversations in the foreign language of the country, it counts as multiple languages.
- If extensive (several long paragraphs or more) quotations in a language other than the base language are present, it counts as multiple languages.
- If the Frenchman in the novel says "Zut!" a lot, it does NOT count as multiple languages.
Determining Allowable Errors for Various Ratings
The PPV Summary Form will calculate the rating based on the difficulty of the project, file size, and the number and type of errors you record. If you like, you can also print a copy of the form and jot down notes on it before entering the information into the online form.
File size referred to below is based on the plain text version. It is easy to check the size of the text file in kilobytes by looking at the files using your file manager. We use only the text — not the HTML — file for this purpose.
Errors such as failure to grasp the italics guidelines is counted as one error, not one error each time italics are wrongly handled. Errors such as he/be errors are each counted as individual errors (e.g., 3 "he" for "be" count as three errors).
If the PPer is asked to resubmit a corrected file, then any errors not corrected or new errors introduced are added to the total number of errors for rating purposes.
Excellent
There should be no Level 2 errors
| If project is: | Errors | 
|---|---|
| Easy | Maximum of one Level 1 error per 300kb, (or no more than 1 error if file size < 300kb) | 
| Average | Maximum of one Level 1 error per 200kb, (or no more than 1 error if file size < 200kb) | 
| Difficult | Maximum of one Level 1 error per 100kb, (or no more than 1 error if file size < 100kb) | 
Very Good
There should be no Level 2 errors
| If project is: | Errors | 
|---|---|
| Easy | Maximum of one Level 1 error per 120kb, (or no more than 1 error if file size < 120kb) | 
| Average | Maximum of one Level 1 error per 80kb, (or no more than 1 error if file size < 80kb) | 
| Difficult | Maximum of one Level 1 error per 40kb, (or no more than 1 error if file size < 40kb) | 
Good
May contain no more than 5 Level 2 errors
| If project is: | Errors | 
|---|---|
| Easy | Maximum of 6 Level 1 error per 120kb, (or no more than 6 Level 1 errors if file size < 120kb) | 
| Average | Maximum of 6 Level 1 error per 890kb, (or nor more than 6 Level 1 errors if file size < 80kb) | 
| Difficult | Maximum of 6 Level 1 error per 40kb, (or no more than 6 Level 1 errors if file size < 40kb) | 
Fair
A Fair rating is assigned if the project contains too many errors to be assigned a rating of Good. Essentially, if a project does not qualify as Excellent, Very Good, or Good, it is Fair.
Late PPV Summaries
Occasionally a project is marked as posted and the link to the PPV form disappears from the Project Page before the PPVer can complete the form. In this situation, a PPVer can also access the form by entering the Project ID into a blank PPV Report. There is also a link to this blank form on the PPV page.