From DPWiki
Jump to navigation Jump to search


Multiple Dictionary Search. Excellent!

GuiGuts HTML Page Number Bug

I fiddled with this bug and the work-around.

Reiterating what's going on. Moving full page [Illustration] and removing [Blank Page ] tags.

After messing with the text, end up with something like this before page separator removal; [*]

----File: 001.png
----File: 002.png
----File: 003.png

Run Fixup->Fix Page Separators (FPS) and then do HTML auto-generate. The HTML page numbers are out order, i.e., Pg 3 Pg. 2 Pg. 1

Original work-around was leave things in place before Fix Page Separators, then move/delete after the HTML auto-generate.

The main thing is that something has to be between the page separators before and after FPS

Move your Illustrations (whatever)/ remove [Blank Page] markup and replace with some kind of marker

----File: 001.png
----File: 002.png
----File: 003.png

Now the FPS thingy can run and HTML auto-generate will have spiffy page number code. After you have the HTML, you can go through and delete the markers and clean up whatever needs it.

* Truncated examples to save space

PP FinalChecks™

A list of checks for that last 'once over' before upload to PPV or DU to PG.

Assumes the use of Guiguts.


Take notes of any issues found in this stage as it will probably be necessary to correct the HTML file if any.

  • --new paragraphs at the top of the page. Make sure these are also new paragraphs in your text file.
  • --new sentences at the top of the page but not new paragraphs. Make sure these are _not_ new paragraphs in your text file.
  • --chapter headings
  • --section headings
  • --missing pages (make sure each page follows the one before it)
  • --thought breaks (formatters sometimes miss these)
  • --blockquotes
  • --poems
  • --tables

If your project is very very long, it's ok to just spot check through these. Do 20 pages or so, then skip down, do another 20 pages or so, etc. If the project is less than 300 pages, though, I'd check every single page.

  • search for --- (3 hyphens in a row)
  • "-- " (two hyphens with a space after, don't include the quotes)
  • --$ (regex)
  • " --" (space followed by 2 hyphens, don't include the quotes)
  • ^-- (regex)
  • \n\n\n (regex, checks your blank line consistency)
  • ... (not a regex, checks for ellipses)

Italics and sundries

  • _([!?.]) (regex, looking for italics consistency), you can replace with $1_ if necessary
  • ([!?.])_ replace with _$1
  • _([,:;]) Replace:$1_
  • ([,:;])_ replace with _$1
  • [oe] Click case-insensitve, string search (not regex)
  • check new markups <g> and <f>
  • check for <tb> markup

Now do a Gutcheck. Spot-check endquotes without punctuation. Look at all the quote warnings, double punctuation, no punctuation at end of paragraph, paragraph begins with lower case, extra period, and punctuation after. Also look at all the long lines and short lines. Both can usually be removed by just rewrapping that paragraph.

Now do a Jeebies check. Read through all the flagged instances. If any of them don't make immediate sense, click it to see what's going on.

Now do a Word Frequency check. Leave the button on Frequency, and check Spell Check. (Note: Pull up Search & Replace box and check 'Whole Word'. This affects the WF spell check.)

Just look through all the ones that occur more frequently than once for anything that catches your eye. Look more closely at all the ones that occur once. Look closely for anything that's all lower-case, since these are more likely to be an error.

When you're done with this, click to sort Alphabetically, and go to Em-dash. Look through these for anything suspicious. Same with Hyphens. Look for anything that doesn't make sense hyphenated, since it's possible the hyphen was inserted by mistake by the OCR.

Go through all the rest of the Word Frequency checks, and pay special attention to Capitalized Words. This is a good place to look for printer errors in names, since they will probably be right next to each other in the list.

Now check for stealth scannos. Go through, looking especially carefully at the commas and periods checks. Now do the en-comm list of stealth scannos.


Open the HTML file in guiguts or favorite editor, fix any applicable errors from the above text file check; spelling, out-of-place italic, etc.

open the HTML file up in a browser. Look at the title text. It should say:

The Project Gutenberg eBook of Title, by Author

If it says anything else, change it. If there's something before or after the author's name (like Rev. or M.D. or something), remove it.

Now look at the page itself. Make sure all the page numbers are in the right order (if present), all at the same margin from the left or the right margin (whichever one you've used here), and make sure none of them are marked up in italics or bold. Look at the images. If they have captions, are they cropped inside the image? They should be cropped out, and rendered instead in HTML underneath the image. Are the images big enough to see without having to click for larger versions? Are they too big and take up too much space on the page? If so, go back and fix them. If there are links to larger versions, are the larger versions really necessary to see more detail? If not, remove the link and the larger version. If the larger version isn't much bigger than the one included, it's probably not necessary either. But the larger version shouldn't be massive either (unless it's a special case, like a very detailed map).

Just space down through the page, making sure nothing sticks out. If there are poems, or blockquotes, make sure everything is lined up properly. Check some of the links in the table of contents or the index (if present), and go back and forth between the links in the footnotes. Look at the title page part, and compare it to the scan to make sure it looks more or less the same. Make sure all your chapter and section headings are marked up consistently. Spot check a few page numbers to make sure they are in the right place.

Now, if you haven't needed to earlier, open the HTML up in Guiguts. Use the HTML button and check both Tidy, and the Links. Make sure both are clean. If there are any errors or warnings, fix them. If you prefer to do the rest of your HTML editing in another program, open it up there now. Look at the CSS. If there are any parts that aren't in use in this project (like footnotes, sidenotes, poem, etc), remove them. Make sure you haven't specified a font size in your body CSS. Upload your HTML to the validator and check for any warnings or errors there.

Check -- for &mdash; conversion. numeric entity: &#8212;

Check any [OE]/[oe] for conversion: &oelig; &OElig; numeric entity: capped: &#338; lower: &#339;