Getting your PP Project Ready for PPV
- 1 Introduction
- 2 Check your project carefully for errors
- 3 Convert and check your quotes and apostrophes
- 4 Check your Images
- 5 Check your HTML
- 6 Printers' Errors and Transcriber's Note
- 7 Checking E-reader versions
- 8 Sample Text Version of the PPV Evaluation Form
- 9 Important Resources for PPers
With the introduction of HTML and e-reader formats, PPing has become a much more complex task. Consequently it's important that new PPers gain the necessary experience before beginning to work on their own.
The Post-Processing Verification (PPV) period is an important learning period and provides a valuable opportunity for new PPers to work closely with and learn from experienced PPVers.
In preparing your projects for review by the PPVers, there are several important checks you should make. You may find the Post-Processing Workbench Tools helpful when you do this checking.
Check your project carefully for errors
Just because a project is in the PP pool does not guarantee that it is perfectly ready for PPing. Before you start, please check the project carefully for missing illustrations and/or proofing images, and unreadable proofing images that should be replaced and re-proofed, etc. Any problem issues that a project has should be fixed, preferably before you start PPing. If there is a problem, please contact the Project Manager before starting to PP. If you don't hear back from the PM within a couple of weeks, then please contact db-req -- db-req at pgdp.net .
Before sending your project for PPVing, you should make the following checks:
- Are there any missing page(s) or substantial sections of missing text?
- Check for hyphenated/non-hyphenated words, spelling and punctuation variants and other inconsistencies and address them appropriately in your Transcriber's Note .
- Ensure there are no errors in punctuation, hyphen/emdash, missing/extra space, line length, illegal characters, etc.
- Check for Spellcheck/scanno errors.
- Make sure that all markup items such as italics, blockquotes, poetry indentations, are correctly handled
- Ensure that poetry indentation matches the original
- Check your footnotes and footnote markers. Are any missing or incorrectly placed?
- Check your rewrapping to ensure that poetry is correctly wrapped and that your text version is rewrapped to the required line length (not exceeding 75 characters or falling below 55 characters) except where unavoidable, e.g., some tables. Generally the aim for text line-length should be 72 characters
- Are there any missing or incorrectly-added paragraph breaks?
- Make sure that your chapter and other headings are consistently spaced, aligned, capitalized or punctuated
- Are there any formatting inconsistencies (e.g., margins, blanks lines etc.)
- Have you addressed all printers' errors?
- Have you sent the project to Smoothreading? (SR can be very useful in identifying problems)
Convert and check your quotes and apostrophes
Project Gutenberg and Distributed Proofreaders prefer the use of curly (smart) quotes and apostrophes over straight quotes for both text and HTML versions of our submitted projects that use this type of punctuation, even if the original used the "straight" varieties. So please convert straight quotes and apostrophes to the curly version and then check as usual that the quotes are correctly balanced throughout the project.
Curly quotes and apostrophes should be UTF-8 characters in the text version but, in the HTML version, may be UTF-8 characters, named entities, or numeric entities. If the document has an XML header, then named entities cannot be used. Please do not use <q> and </q> tags. For texts in other languages than English, please use language-appropriate quotation marks as usual. For more information, please read the Post-Processing FAQ section about quotes and apostrophes.
Check your Images
- Don't leave unused files in your project's images folder.
- Size your thumbnail, inline and linked-to images appropriately. Image sizes should not normally exceed the limits described here, but exceptions may be made if warranted by the type of image or book (provided you explain the exception).
- Correct major blemishes and rotation/distortion image problems and crop the images appropriately.
- Make sure all your images are in the images folder
- Set image size appropriately via HTML attribute or CSS so that the image is not distorted in HTML, epub or mobi.
- Use appropriate "alt" tags for images that have no caption and include empty "alt" tags if captions exist.
Check your HTML
- Make sure your CSS validates as CSS 2.1 or below (CSS checks should generate no error or warning messages other than for use of the "transparent" element for dropcaps). Note: Since August 2017, Project Gutenberg now accepts CSS 3 provided that code has been marked as "completed work," and has "REC" (for Recommendation) status according to the W3 specifications. As with dropcaps, it is necessary to add a note to the PPVer concerning what CSS3 has been included. (This is a very new policy and may be adjusted based on an issues experienced with submissions, so Project Gutenberg has asked DP volunteers to watch http://upload.pglaf.org/ for changes.)
- Your HTML should validate as XHTML 1.0 Strict or 1.1 (HTML checks should generate no error or warning messages). Issues (other than table "summary") reported by Tidy should have been addressed
- Make sure all your links work
- Your file and folder names should all be in lowercase or shouldn't contain spaces. Please make sure they are all in the images folder.
- Include a cover image as cover.jpg. If your project has no cover, you should create or arrange the creation of one according to DP policy.
- Run your project through eBookmaker and make sure the results are presentable/usable (Please see the section below, Checking E-reader Versions).
- Don't use heading elements for things that are not headings. Please use hierarchical headings for book, chapter and section headings (single h1, appropriate h2s and h3s etc.). Also, please enclose entire multi-part headings within the related heading tag. For more information about headings, please read the DP Best Practices Heading section.
- Use <div class="chapter"> added at chapter breaks to enable proper page breaks for ereaders -- For more information on what the coding should be please read Easy Epub -- Formatting Chapter Headings to Avoid unwanted paged breaks mid-chapter
- Don't use px sizing units for items other than images or borders. According to W3Schools, "Absolute length units are not recommended for use on screen, because screen sizes vary so much."
- Make sure your <title> is present and that it's correctly worded (for example, <title>The Project Gutenberg eBook of Alice's Adventures in Wonderland, by Lewis Carroll</title> or <title>Alice's Adventures in Wonderland, by Lewis Carroll—A Project Gutenberg eBook</title>). For the dash separating "A Project Gutenberg eBook" from the book title, you may use two hyphens (--), a UTF-8 em dash (—), or an HTML em dash (—).
- Place <html>, <body>, <head>, </head></body>, and </html> tags each on its own line (and make sure you're using these headings correctly). This is important to the volunteers who process our projects at Project Gutenberg.
- Ensure that you haven't used empty tags (with entities) or <br /> elements for vertical spacing. e.g. <p><br /><br /></p> (or with nbsps) -- <td> </td> is still acceptable though
- Check your tables. Do they display left, right, and center justification and top and bottom align appropriately? And make sure you don't use tables for things that aren't actually tables. Do your tables use the element for table headings (for more information on table headings, please see http://www.w3schools.com/tags/tag_th.asp)?
- Do not use <pre> tags. If you need monospace text, please set this via the CSS.
- Make sure you have used list tags for lists (for example, for the items in a regular index)
- Ensure that you've included all text as text, not just as images
- Make sure you've kept your code line lengths reasonable – someone has to be able to read them.
Printers' Errors and Transcriber's Note
Obvious printers' errors should be addressed in one, or a combination, of the following ways:
- Correct silently and state in the Transcriber's Note that all such errors have been corrected silently.
- Correct all such errors and note them in Transcriber's Note, linking (in the HTML version) each change to the change made in the text
- Leave uncorrected and state in the Transcriber's Note that at all such errors were left uncorrected.
"Not addressing printers' errors" means that all, or a large percentage, of printers' errors have been left uncorrected and not noted. If just one or two have been missed, and the rest addressed, then those missed would instead be counted as the relevant type of error (spellcheck, gutcheck, etc.).
Anything that could make a reader think an error has been made in the transcription should be mentioned in the Transcriber's Note.
Please ensure your grammar and spelling are correct in the Transcriber's Note.
Checking E-reader versions
With the clear expectation from PG and the recommendation of the DPF Board that our projects should look good in e-reader versions, the PPV community have agreed that PPVers should support PPers (and check their work) in producing HTML versions that convert successfully to epub and mobi (Kindle) formats.
The Easy Epub wiki pages are intended to help with that process of checking e-reader versions and making some simple changes to improve them if necessary.
The wiki pages include information on how to view your project in e-reader format, even if you don't have an e-reader.
Also, understanding the HTML Best Practices helps us understand the coding practices that can ensure the project converts seamlessly to e-reader format.
Please use a suggested viewer to test the epub and mobi versions of the book.
Here are some problem areas to look for:
Front and End of Book
- Title page layout
Body of Book
- Horizontal rules
- Obscured sections within the book such that text covers other text or blank areas occur where text should be
- If hovers were used in the HTML, all important “hovered” information should be present and readable in a non-hovered way within the e-reader version. Also Transcriber's Notes referring to hovers should be hidden in the e-reader version.
- Page numbers (if present)
Sample Text Version of the PPV Evaluation Form
Here is a text version of the form the PPVers fill out when they evaluate a PP project. The form automatically calculates the PPing difficulty of the project and the PP rating (based on the number and type of errors made plus the difficulty and size of the project).
Difficulty Details ================== Text File Size in kb:
Present in the text: Some / Significant Amount Poetry (other than straight poetry) Some/Sig Blockquotes Some/Sig Footnotes Some/Sig Sidenotes Some/Sig Advertisements Some/Sig Tables Some/Sig Drama Some/Sig Index Small/Sig
Illustrations: (Number of) : Illustrations (advanced) Y/N Multiple Languages Y/N Extensive Spellcheck/Gutcheck Y/N Engliſh Y/N Musical Notation and Files Y/N Extensive mathematical/chemical notation Y/N
LEVEL 1 (Minor Errors) Approximate number of errors All Versions ------------ Spellcheck/Scanno errors : Gutcheck-type errors : Jeebies errors (English only) : Paragraph breaks missing or incorrectly added : A few inconsistencies not addressed, e.g hyphens, spelling, punctuation : Chapter/headings inconsistent : Formatting inconsistencies : Other minor errors (Please explain) :
HTML Version Only ----------------- Images
Unused files in images folder : Appropriate image size not used : Images with major faults : Failure to enter image size appropriately such that the image is distorted : Failure to use appropriate "alt" :
Use of px for items other than images and borders : <title> missing or incorrectly worded : Use of <pre> tags : Failure to place <html>, <body>, <head>, </head></body>, and </html> tags each on their own line and correctly use them : Use of tables for things that are not tables : CSS above CSS 2.1 (except dropcap) : HTML version not XHTML 1.0 Strict or 1.1 : Failure to add <div class="chapter"> or <div class="section"> : Minor HTML errors that still validate :
LEVEL 2 (Major Errors) Approximate number of errors All Versions ------------ Markup not handled : Poetry indentation does not match original : Footnotes/footnote markers wrong : Printers' errors not addressed : Missing page(s) or substantial text : Substantial rewrapping errors : Widespread inconsistencies not addressed, e.g hyphens, spelling, punctuation : Other major errors impacting readability or major text/HTML inconsistencies :
HTML Version Only ----------------- W3C Markup Validation errors/warnings : W3C CSS Validation errors/warnings : Non-working links within HTML or to images : Bad file/folder names, e.g not lowercase, contain spaces, images in wrong folder, etc. : Cover image not included / bad size : Not presentable/useable after epub conversion : Headings elements used wrongly or failure to use hierarchical headings :
Strongly Recommended ==================== Enclose multi-part headings within tag Y/N Avoid using empty tags (or with nbsp) Y/N List Tags should be used for lists Y/N Include all text as text, not just as images Y/N Keep your code line lengths reasonable Y/N Tables - l/r/center/top/bottom align correct Y/N Tables contain <th> elements for headings Y/N Remove thumbs.db file from the images folder Y/N E-reader version as good as possible Y/N
Mildly Recommended ================== Distinguish between decorative/semantic use of italics/bold/gesperrt Y/N Include space before the slash in self-closing tags (e.g. <br />) Y/N No unused elements in the CSS Y/N
Strongly and Mildly Recommended errors are not counted as formal errors but please try to adhere to the Strongly Recommended items. Jeebies and Gutcheck errors refer to the type of error normally caught by those programs rather than implying a requirement for you to have used them specifically.
Important Resources for PPers
- Google books Ngram Viewer - This site lets you check the frequency of various word varieties. It is particularly useful for comparing hyphenated vs non-hyphenated versions of words.
- Post-Processing FAQ
- Post-Processing Advice
- The Official “No Dumb Questions” thread for PPers
- There are several main tools used for PP. Try searching for ppgen, PPQT 2, Guiguts, Bookloupe, pptxt, pphtml and the various online tools like The pptool that will perform many tests and checks at one upload on both text and html files. https://pptools.tangledhelix.com/
HTML and Ereader
- Easy Epub This document provides valuable pointers on how to make your HTML work well when converted to e-reader formats.
- DP HTML Best Practices - Case Study: Headings
- DP HTML Best Practices - Case Study: Title Pages
- DP HTML Best Practices - Case Study: Poetry
- DP HTML Best Practices - Case Study: Tables