User:Sigal/my PP checklist

From DPWiki
Jump to navigation Jump to search


1. Initial Setup

  • read the project forum page, note any issues proofers raised.
  • Make a project folder, e.g. (Win) d:\dp\pp\bookname
  • Download the text and images files and unpack:
    • Text to projectId.txt.
    • page images (nnn.png) in subfolder pngs
    • empty subfolder images.
    • hi-res illustration scans (imagenn.png) in subfolder originals or directly into images subfolder

2. Sequential Inspection of Text

Check for:

  • Page labels table, mark empty pages, non counter pages, skipped page numbers, etc.
  • Proper markup of <i>italic</i> and <b>bold</b>.
Watch for punctuation wrongly contained in markups, such as <i>(ibid.</i> or <b>Subtopic.</b>.
  • Proper markup of Greek and other transliterations (content check later)
  • Block material all marked in some fashion:
    • block quotes in /# #/
    • tabular in /* */
    • convert poetry from /*..*/ to /P..P/
    • check for a blank line before and after markup
    • make sure correct type of markup used
    • close-up where broken at page boundaries
    • apply specific indent value if desired
    • make sure poetry line numbers are at least two spaces to the right of the line.
    • Fix block markups that cross page boundaries
    • check consistent indentation of block text
    • apply specific margin values if desired
  • Figures properly in [Illustration: caption]
    • check: caption text agrees with List of Illustrations (if any)
    • consistent spelling, abbreviation, capitalization in captions
  • Fix Footnotes, Illustrations still inside a paragraph.
    • move outside paragraph to next or prior page as appropriate
    • don't worry about duplicate footnote number/symbol now
    • sidenotes handled later
  • Make notes of things that will need attention in the HTML:
    • Author cross-references like "(p. 150)" and "see page 222" that should become links.
    • How the editor laid out special sections such as tables and sidebars.
  • Change spaces to non-breaking spaces where applicable (abbr., initials, etc.). For example, replace ([^ ]\.) ([^ ]\.) with $1 $2

3. Fix Orphan Markups and Proofer Notes

  • Use Orphaned Markup dialog to check and correct orphans of each type in turn. Do not omit the lowly parenthesis, often mis-scanned as curly-brace.
  • Search&Replace: text: (?<!/)\*(?!/) (a literal asterisk, but one neither preceded nor followed by a slash), regex; keep clicking "Search" to check all asterisks in document.
    • look for malformed thought-breaks (5 stars)
    • resolve proofer's notes, which are indicated by asterisk
  • Search for the new <tb> thought break mark.
    • Replace with Fixup/Add a Thought Break.
    • Or replace string in Search/Replace popup:
             *       *       *       *       *

4. Basic Fixup

5. Format Front Matter

  • Format the title page, preserving as much of the original material as possible. Protect in /F...F/ (no rewrap, no indent, centered in the html version).
  • Edit the TOC. Find each matching chapter head; make sure heads are 1:1 with TOC. Protect TOC with /X...X/ (no rewrap, no indent, uses pre in html).
  • If book has illustrations, edit or create List of Illustrations (Note: this is not a requirement). Make sure it is 1:1 with [Illustration] captions. Protect with /X...X/.

6. Edit Transliterations

  • Search&Replace: text: \[[^FIS] (left-bracket followed by anything other than F, I or S), regex. Check content of each transliteration. For Greek, use the Greek Transliteration Tool.

7. Remove Visible Page Breaks

  • Run Fix Page Separators to remove visible page separators
  • Use Adjust Page Markers tool to make sure page markers are at top of pages.

8. Apply Word-Frequency Checks

Open the Word Frequency report.

  • Set the Frq switch; click All Words. List is now sorted by word frequency; scroll to the end and skim up the list of words that only appear 1 time looking for oddities and obvious misspellings.
  • Click Character Cnts.
    • Note characters that appear only once, check usage.
    • Check for equal counts of left & right parens and brackets.
  • Set the Alph switch; click All Words. Scroll to the word Footnote and write down count for later use. (If the count is large, click once on Footnote and click 1st Harm. The harmonic window shows you any of the common misspellings of "Footnote" that occur.)
  • Click Emdashes. Conflicting usages are marked with asterisks; check against text and page images. Preserve author's intent even when inconsistent.
  • Click Hyphens. Resolve conflicts as above.
  • Click Alpha/num. Scan list for one/ell and oh/zero errors.
  • Click ALL CAPS. Scan list looking for oddities.
  • Click MiXeD CasE. Scan list looking for letters such as o that sometimes OCR wrongly as uppercase. Oh/zero errors can show up here, too.
  • Click Check Accents. Scan list looking for mistakes, inconsistent usages.
  • Click Check , Upper. Scan list for comma-for-period errors.
  • Click Check . Lower. Scan list for period-for-comma errors.

9. Apply Scanno Checks

  • Turn on automatic scanno highlighting and go through the entire text.
  • Use Fixup> Run Jeebies and examine its report of possible he/be errors.
  • Start scanno searching based on eng-common.rc. Work through the list.
  • Apply scanno searching based on misspelled.rc. Work through the list.
  • Apply scanno searching based on more-misspelled.rc. Work through the list.
  • Apply scanno searching based on regex.rc. Work through the list.

10. Apply Gutcheck

11. Apply Spellcheck

  • Start the spellcheck process. Proceed through the document, correcting words or adding them to the project dictionary as appropriate.

12. Fix Sidenotes (0-? hr.)

Read the discussion. Step through sidenotes with: Search&Replace of [S, not regex, not whole word, ignore case. Click Search to find each Sidenote.

  • Compare to page image. Move note above paragraph if feasible.
  • Otherwise, position it above the sentence to which it applies, with blank lines to prevent rewrapping if you decide that is best.

13. Fix Footnotes (0-? hr.)

Read the discussion and follow the steps on this page.

14. Fix Poetry Line Numbers (0-20 min.)

If the book has poetry that uses line numbers, read this page and align the line numbers consistently.

15. Check balanced markup

Search&Replace for \<(\w+)>\n?[^<]+<(?!/\1>) (any starting markup in <..> that doesn't end in an identical closing markup). Because it includes a newline, the search may take several seconds to return the first result.

  • Correct the error and click search until no more are found.

16. Save Edited Markup

  • Save any unsaved changes in projectId.txt. This will be the fallback in case of a mess up and need to start the following steps over.
  • Use File>Save As to make bookname.html. This will be the starting file for the HTML version.
  • Use File>Save As to make bookname.txt.

17. Convert Italic, Bold, and Smallcap

  • Fix italics: use Search&Replace, text </?i> (<i> or </i>), regex. ignore case. Replacement: underscore. Click Replace All. Italic markup is replaced with underscores.
  • Fix bold. Decide if you want to mark bold with $, or =, or by all uppercase.
    • For $, use Search&Replace, text </?b> (<b> or </b>), regex. Replacement: $. Click Replace All.
    • For =, use Search&Replace, text </?b> (<b> or </b>), regex. Replacement: =. Click Replace All.
    • For uppercase, use a regex search for <b>(\n?[^<]+)</b> (<b> then anything including newline up to the first </b>). Replacement: \U$1\E.
Click Search, then Replace until you are confident it works; then Replace All. Afterward, search for b> and hand-edit any remaining bold.
  • Fix Small-cap, which proofers have changed to <sc>Title-Cased-Text</sc>. regex find <sc>(\n?[^<]+)</sc> (<sc> then anything including newlines up to </sc>; note this will not find small-cap that spans other markup such as italic.) Replacement \U$1\E.
  • Save the document.

18. Fix ASCII Tables (0-? hr.)

  • Use Search>Find Next /**/ Block to step through all tabular material.
    • Compare to page image; reformat to best convey author intent.
    • For complex tables, use Table Special Effects to reformat.

19. Rewrap and Clear Rewrap Markers (10-30 min.)

  • Save the file if any unsaved changes.
  • Use Edit>Select All then Selection>Rewrap Selection. Wait while rewrap completes.
  • Page through entire text, looking for improper indentation. If found, re-open, clicking NO when asked if you want to save the edits. Find and fix broken rewrap markups. Repeat this step.
  • Open Fixup>Footnote Fixup; tidy up footnotes. See this discussion.
  • Remove all rewrap markers: see this page.
  • Use Fixup>Remove End-of-line Spaces.
  • Use Fixup>Run Gutcheck and resolve any new issues.
  • Save the document.

20. Determine Character Coding (5-60 min.)

Character codes are described here. You need to be certain which the coding your etext uses.

Search&Replace, text [\x7f-\xff], regex. If nothing is found, the book contains only characters from the 7-bit ASCII set and you are done.

If 8-bit characters are found, use Fixup> Run Word Frequency Routine. In the report window, click the Unicode>FF button. Words containing a multi-byte (Unicode) character are listed. If none are shown, the text is probably, but not certainly, Latin-1; it is possible that you have inserted Unicode punctuation that is not part of a word. But you should be aware if you have used the Unicode menu or pasted a Unicode symbol.

If your text has symbols from Latin-1 or Unicode, read or re-read this item of the Gutenberg FAQ. Decide if you will upload a single version or if you should do the division into ASCII and high-bit versions. If you will do it, then:

  • Use File>Save As to "fork" your single document into versions:
bookname.asc for a pure-ASCII version;
bookname.lt1 for a version with Latin-1 accented characters;
and/or bookname.utf for a version that has Unicode characters.
  • Open bookname.asc.
  • Search with the regex \P{IsAscii} (note uppercase P) to step through each character not 7-bit ASCII
  • Replace each, using some consistent substitution scheme (for example, ['e] for é, etc.).
  • Add a "Transcriber's Note" to the head of the text to document your substitution scheme.
  • In a similar manner, search bookname.lt1 for Unicode characters and replace them with Latin-1 equivalents. Add a "Transcriber's Note" to document the substitutions.

Pure-ASCII etext bookname.asc and optional Latin-1 bookname.lt1 and bookname.utf are ready to upload!

21. Prepare HTML Edition (4-? hr.)

You saved bookname.html before you rewrapped because it is handy to have the HTML source lines match the original text. However, the HTML generated for /P..P/ poetry sections assumes that these have been rewrapped. If the book contains poetry, select these sections and rewrap them so the poetry is properly indented.

Make a duplicate of bookname.html for fallback.

  • Open bookname.html.
  • If you will insert visible page numbers or anchors at page boundaries, then configure the page labels before proceeding
  • Open the HTML Palette and set optional switches as desired.
  • Apply Automatic HTML conversion and wait while it completes.
  • Save the file and open it in a browser.
  • Scroll through looking for systematic errors. (Title pages, tables, etc. will look terrible; no matter). If automatic conversion messed up, delete the file and start this step over with the backup file.
  • Page through the book looking for text that was not handled well by automatic HTML generation, in particular:
    • Title pages.
    • Tables.
    • Tables of Contents and Indexes, which are best formatted using unsigned lists, rather than the markup Guiguts generates for /$..$/.
    • Illustrations.
  • Use the element-markup buttons in the HTML Palette to mark up these areas. Use regex replacements to make systematic changes.
  • Open the file in one or more web browsers (Internet Explorer and at least one other such as Firefox or Netscape). Page through the entire book.
    • Where you see a problem, make a correction in Guiguts, save the file, and click the "reload" button in each browser.
  • Hyperlink page references in text, TOC, and index (discussed here).
  • Apply the Link Checker and correct all issues found.
  • Optionally, apply Tidy.
  • Open the WC3 Validator, upload the file, and correct the nits it picks.

22. Process Hi-resolution Images (? hr.)

If the project manager provided high-resolution scans of the images in the text, use an image-processing program such as The Gimp or Adobe Photoshop Elements to optimize them—see the wiki topic. You can do this before, during, or after HTML step 20. For each image:

  • Load image from the originals folder (see step 1)
  • Straighten it (almost all scanned images are off-perpendicular; some are trapezoidal owing to the page not being flat on the scan window).
  • Crop it to remove all redundant white space and borders (provide margins and borders with CSS styling of the <img> markup).
  • Correct the contrast (you must have calibrated your monitor, see this page).
  • Sharpen.
  • Correct any major scratches, freckles, dirt, etc.
  • Save in the subfolder images using appropriate type:
    • Line drawings in .png at 8 bits per pixel (not the default 24-bit RGB format).
    • Photographs as .jpg with an appropriate compression level such as (Photoshop) level 6.
  • Page through entire HTML book making sure that each image is being loaded correctly. Test each thumbnail if used.

23. Upload the Finished Project

  • Prepare a new folder whose name contains the full project ID, e.g. projectID40213e6231ac4
  • Move into it only the files to be uploaded:
    • the etext file(s) bookname.asc, bookname.lt1, and/or bookname.utf.
    • the .bin files related to those (some PPVers use Guiguts too!)
    • the HTML file if one was made
    • the images folder if required by HTML
Do not include the original images or the page images; do not include any work files or scratch files or auto-backup editions. If you have been told to upload directly to the Gutenberg site for a whitewasher, do not include the .bin file(s).
  • Mac OS X users: the Finder creates hidden files named .DS_Store in any folder you display as a window. Although harmless, these files are not wanted by PG. Get rid of them as follows: In a terminal window, cd into the project folder. Run this command, copying its arcane syntax precisely:
find . -name ".DS_Store" -ok rm '{}' \;

You will be asked for deletion confirmation.

  • Linux and Mac users: cd into this folder and use the command unix2dos *.txt; unix2dos *.html.
  • Use a zip utility to make a zip archive of this folder. (OS X users: do not use the Finder command File> Create Archive of...; it creates a gzip file that PG cannot use. Use a zip command in a terminal window.)
  • Open the project page in your web browser and at the bottom, select Change Project State: Upload for Verification.
  • On the next page, write comments noting any unusual features of the book.
Especially note the character code (7-bit, Latin-1, or Unicode) of the single .txt file, or the differences between multiple etext files.
  • use the Browse button to navigate to the zipped file. Wait while it uploads, which can take quite a while.

Related Pages