User:SF2001/Guiguts PP Process/HTML

From DPWiki
Jump to navigation Jump to search


This checklist is made for use with Guiguts version 1.1.1 with standard menus.

Arrive here after doing the common processing for a book.

It contains steps for all types of books including those with

  • Illustrations
  • Poetry
  • Footnotes
  • Sidenotes
  • Indexes

Not all projects need all steps.

As you gain experience you may decide to do things in a different sequence.

Prepare HTML Edition (4-? hr.)

Automated HTML

  • Open bookname.html that was saved at the end of the common process.
  • If you will insert visible page numbers or anchors at page boundaries, then configure the page labels before proceeding
    • Do this for all books. Easy to make invisible and delete later with regex. Incredibly painful to add later.
  • Check the footnotes and move them to the Landing Zones you have decided on, as described in the PPTools/Guiguts/Footnotes.
  • It is preferable for the source line-breaks to match the book;
    • However HTML poetry markup won't work unless /P..P/ sections have been rewrapped. If the book has much poetry, rewrap it all; else select and rewrap poetry sections individually.
  • Info on wrap flags
  • Keep the rewrap markers. These are needed for generation of proper HTML.
  • Open the HTML > HTML Generator... HTML Generator and set optional switches as desired.
    • Check Insert anchors at Pg#s because they are easy to delete and very difficult to add back in.
  • Click Automatic HTML conversion and wait while it completes.
    • Save the file
    • Open it in a browser: There is a button on the menu to open it
    • Scroll through looking for systematic errors. (Title pages, tables, etc. will look terrible; no matter).
    • If automatic conversion messed up, delete the file and start this step over with the backup file.
    • If automatic conversion worked well save the file as a new version (22)

Manual HTML 1

  • PP_examples_on_PG
  • Page through the book looking for text that was not handled well by automatic HTML generation, in particular:
    • Title pages.
    • Tables.
    • Tables of Contents and Indexes, which are best formatted using tables, rather than the markup Guiguts generates for /$..$/.
      • However, the fact it automatically adds linkable IDs is quite worthwhile
      • Some procedures for making an HTML Index with page links are described here.
    • Illustrations.
  • Use the HTML > HTML Markup... element-markup buttons in the HTML Palette to mark up these areas. Use regex replacements to make systematic changes.
  • Open the file in one or more web browsers (Internet Explorer and at least one other such as Chrome) using the "View in Browser" button. Page through the entire book.
    • Where you see a problem, make a correction in Guiguts, save the file, and click the "reload" button in each browser.
  • Hyperlink page references in text, TOC, and index (discussed here and here).

Right Align

Sometimes at the top of a letter or section there will be a right aligned bit of text with the location and/or date. A signature block can also be like this.


Poetry attribution is frequently right aligned and should be kept inside the poetry block.

Hanging Indent

CSS for hanging indent

.hanging2 {padding-left: 2em;
         text-indent: -2em;

Add class="hanging2" to the HTML paragraph markup.


You'll want to at least crop pictures before using the auto-illustration button in HTML generation or it won't scale correctly.

Some very simple processing:

  • IrfanView: Increase contrast and saturation to make red pop. Sharpen to make gold pop.
  • GIMP: Clone tool to repair and harmonize cover.


  • Irfanview:
    • Increase contrast and saturation for better depth perspective.
    • Sharpen to make highlights pop.
    • Copy, paste, flip, save for repairs.
    • Replace image border with a cleaner profile.
  • GIMP: Clone tool to repair and harmonize. Lots of "By Gosh!" and "By Golly!" and "What is that?"







My preferred CSS for captions is

.caption p {
  text-align: center;
  text-indent: 0;
  margin: 0.25em 0;
  font-size: smaller;

Replace the default with it.


Manual HTML 2

Decide what (if anything) you plan to do about non-Latin-1 characters in the document.

You can use HTML escapes to insert any Unicode character. High runners:

  • oe-ligature, which is œ.
  • Greek works, too, but be aware that not everyone will have an extensive set of fonts loaded.

If there is a cover image for e-readers supplied with the project, or you are creating one yourself, you can find information on what is needed in your HTML in the Proofreaders' Guide to EPUB or the PP guide to cover pages.

  • HTML Markup > Find Some Orphaned markup ??? to find any mismatched html markup in the file.
Note: The search will stop on any nested spans, even though this is valid html. If this happens, you may want to make an extra copy of your file, remove any nested spans in it, and then check for orphaned html markup in that file in order to do a complete check. If you do find orphaned markup, be sure to go back and apply the changes to your original file!
  • HTML > HTML Link Checker and correct all issues found.
  • Sometimes there are decorations between lines.
    • You'll need to get a copy of each of the decorations, name them, and put them in the images subdirectory
    • Format something like this. Take out the blank after the &. It is there due to limitations in Wiki display
<p>& nbsp;</p>
<img src="images/deco.png" width="322" height="30" alt="deco" title="deco">
<p>& nbsp;</p>
  • Apply Tidy. HTML > HTML Tidy
  • Use the W3C Validate and W3C CSS Validate buttons to check the correctness of HTML and CSS. There are reports that the buttons do not pick up all the errors found by the web site so the final checks should be done on the web site.
    • Final test for HTML is WC3 Validator, upload the file, and correct the nits it picks.
    • Final test for CSS is CSS Validator, upload the file, and correct the nits it picks.
  • Remove unused CSS using the PPHTML check
  • Confirm that the title field reads <title>The Project Gutenberg Book of TITLE, by AUTHOR.</title>.
  • Use the HTML>PPVIMAGE with verbose to check the correctness of links to images.
  • Use the Check All button to run all of the above checks, especially if you make any further changes. Always run these checks one more time before submitting.

Process Hi-resolution Images (? hr.)

If the project manager provided high-resolution scans of the images in the text, use an image-processing program such as The Gimp or Adobe Photoshop Elements to optimize them—see the wiki topic. You can do this before, during, or after HTML conversion. For each image:

  • Load image from the originals folder (see step 1)
  • Straighten it (almost all scanned images are off-perpendicular; some are trapezoidal owing to the page not being flat on the scan window).
  • Crop it to remove all redundant white space and borders (provide margins and borders with CSS styling of the <img> markup).
  • Correct the contrast (you must have calibrated your monitor, see this page).
  • Sharpen.
  • Correct any major scratches, freckles, dirt, etc.
  • Save in the subfolder images using appropriate type:
    • Line drawings in .png at 8 bits per pixel (not the default 24-bit RGB format).
    • Photographs as .jpg with an appropriate compression level such as (Photoshop) level 6.
  • Page through entire HTML book making sure that each image is being loaded correctly. Test each thumbnail if used.

Ebook reader

Part of checking your HTML is to make sure the formatting on electronic books, like Kindle, look reasonably good. This step will generate the files for checking.

Don't trust what Guiguts has just because it will typically be out of date.

Project Gutenberg apparently uses this:


Run PPComp to ensure no text was added or deleted.

Related Pages