User:Noyster/guiguts process

From DPWiki
Jump to navigation Jump to search

This is my own procedure for post-processing using Guiguts. It evolves in the light of experience.

Initial Setup

Familiarisation

  • Go to Project page
    • Read details and requirements.
    • bookmark the project URL and note project ID number.
    • read the project forum page, note any issues proofers raised.

Project folder

  • Make a project folder /home/neil/dpcopy/XXXXX
  • Download Zipped Text and images files from project page, and unpack in new folder:
    • Text to bookname.txt.
    • page images (nnn.png) in subfolder pngs
    • hi-res illustration scans (imagenn.png) in subfolder originals
    • empty subfolder images.
  • Prefs - Set file paths - Set images directory
  • Open text file in Guiguts & SAVE. --->File 0

Page breaks

Configure Page Labels to associate page numbers as per book with the assigned .png pages. - Right-click on 'Lbl: None' field in the status bar at bottom.

Remove page separators: Tools | Fix Page Separators. Only Join Lines when text above and below. As you go re-join cross-page hyphens & dashes, and check for correct use of blank lines.

Insert /* ************ CHAPTER 1 ************** */ etc. for major divisions.

Now SAVE - this will be the "untouched" file to compare with later corrected versions. --->File 1

Sequential Inspection of Text

This is the only step in which you will examine the whole text in sequence; hereafter you navigate with searches.

Turn on automatic scanno highlighting.

Skim the text comparing it to the page images and double-checking format. [Space] moves to next image.

Check for:

  • Proper markup of <i>italic</i> and <b>bold</b>.
Watch for punctuation wrongly contained in markups, such as <i>(ibid.</i> or <b>Subtopic.</b>.
  • Check & remove "[Blank page]".
  • Proper markup of Greek and other transliterations (content check later)
  • Block material all marked in some fashion:
    • poetry, misc. tabular in /* */
    • block quotes in /# #/
    • Fix block markups that cross page boundaries now or in the next step
  • Figures properly in [Illustration: caption]
    • check: caption text agrees with List of Illustrations (if any)
    • consistent spelling, abbreviation, capitalization in captions
  • Fix Footnotes, Illustrations, Tables still inside a paragraph.
    • move outside paragraph to next or prior page as appropriate
    • don't worry about duplicate footnote number/symbol now
    • sidenotes handled later
  • Make notes of things that will need attention in the HTML:
    • Author cross-references like "(p. 150)" and "see page 222" that should become links.
    • How the editor laid out special sections such as tables and sidebars.

Formatted portions

Format Front Matter

  • Format the title page, preserving as much of the original material as possible. Protect in /x ... x/

(no rewrap, no indent) or /F ... F/ (same, except centred in HTML version).

  • Edit the TOC. Find each matching chapter head; make sure heads are 1:1 with TOC. Protect TOC with /x ... x/.

The TOC will probably need to be indented to prevent rewrapping, particularly if you use multiple spaces to align page numbers.

  • If book has List of Illustrations, make sure it is 1:1 with [Illustration] captions. Protect with /x ... x/.

Fix Block Markups

Nowraps

  • Use the Search menu to step through all /* */ blocks.
    • check for a blank line before and after markup
    • close-up where broken at page boundaries
    • make sure poetry line numbers are at least two spaces to the right of the line.

Blockquotes

  • Use the Search menu to step through all /#..#/ blocks
    • check for a blank line before and after markup
    • close-up where broken at page boundaries
    • check consistent indentation of block text

Checks & corrections (1)

Other searches

  • Use Check Orphaned Brackets and Find Orphaned Markup to check and correct orphans of each type in turn. Include the parenthesis, often mis-scanned as curly-brace.
  • Use Search Menu, Search for Asterisks w/o slash, keep clicking "Search" to check all asterisks in document.
    • look for malformed thought-breaks (5 stars)
  • Check quotes and commas:
    • space"space and space'space and try out start/end of line combos too
    • Search regex ^'space and space'$ and same for "
    • comma with no spaces, [a-zA-Z],[a-zA-Z] or [a-zA-Z]\.[a-zA-Z]
  • Check for spaces on first lines: ^(space) using Search and Replace regex
  • Check initials: space between or not? [A-Z]\. [A-Z]\. versus [A-Z]\.[A-Z]\.
  • Check for double spaces
  • Dashes and ellipses: Check space-- and --space [A-Z,a-z] -- etc. This is good for TOC-type things and for proper names
  • Check ellipses for spacing
  • Subscripts and Superscripts: Subscripts CO_{2}, t_{20}. Superscripts mc^2, 1^{st}.
  • Check for no period at end of paragraph: [a-zA-Z]\n\n
  • Check for spaced hyphens: hyphen followed by a space
  • Check for triple hyphens: Make sure ---- isn’t --- (can search for regex [^-]---[^-]
  • Check for dashes with spaces after: unclothed hyphens -- space-- or --space . Good to check all dashes anyway since some are words too and should have spaces, AND Guiguts seems to be cutting spaces before them at some point. Check for space-- and --space and --endofline and start of line
  • OE / AE : Check oe and ae diphthongs to make sure they didn't get mixed up
  • Check i.e. i. e.

Proofer Notes and end-of-line hyphens

  • Resolve proofer's notes, which are indicated by asterisks [** this is a proofer's note]
  • Resolve end-of-line hyphens marked with -*

Edit Transliterations

  • Use Search Menu, Find Transliterations (which searches for a left-bracket followed by anything other than F, I or S using the regular expression \[[^FIS]). Check the content of each transliteration. For Greek, use the Greek Transliteration Tool.

SAVE ---> File 2

Checks & corrections (2)

Basic Fixup

Anything with horizontal spacing has to be in nowraps now.

Apply Word-Frequency Checks

Open the Word Frequency report. Double click on a word to search for it.

  • Set the Frq switch; click All Words. List is now sorted by word frequency; scroll to the end and skim up the list of words that only appear 1 time looking for oddities and obvious misspellings.
  • Click Character Cnts.
    • Note characters that appear only once, check usage.
    • Check for equal counts of left & right parens and brackets.
  • Set the Alph switch; click All Words. Scroll to the word Footnote and write down count for later use. (If the count is large, click once on Footnote and click 1st Harm. The harmonic window shows you any of the common misspellings of "Footnote" that occur.)
  • Click Emdashes. This shows words with emdashes in them as well as similar words without emdashes (aka: suspects) marked with ****. Check suspects against the text and page images. Preserve author's intent even when inconsistent. Hint: Enable the Suspects flag and click Emdashes again to see only suspects words.
  • Click Hyphens. Same as Emdashes above but for Hyphens.
  • Click Alpha/num. Scan list for one/ell and oh/zero errors.
  • Click ALL CAPS. Scan list looking for oddities.
  • Click MiXeD CasE. Scan list looking for letters such as o that sometimes OCR wrongly as uppercase. Oh/zero errors can show up here, too.
  • Click Check Accents. Scan list looking for mistakes, inconsistent usages.
  • Click Check , Upper. Scan list for comma-for-period errors.
  • Click Check . Lower. Scan list for period-for-comma errors.
  • Click Ital/Bold/SC. Scan list for incorrect or inconsistent use of italics, bold face, and small caps.
  • Click Ligatures. Scan list for incorrect or inconsistent use of ae and oe ligatures.

Workbench

Go to PP Workbench | pptext. Check output & correct as necessary.

Guiguts Spellcheck

Start the spellcheck process.

  • Proceed through the document, correcting words or adding them to the project dictionary as appropriate.

Check balanced markup

Use the Search menu, Find Orphaned Markup.

  • Correct the error and click search until no more are found.
  • SAVE the file --->File 3.

Special formatting

Curly quotes

  1. Take backup of .bin file
  2. Curly quotes - Workbench | ppsmq.
  3. Read back into Guiguts
  4. Overwrite .bin file with the backup
  5. Close and re-open Guiguts file
  6. Check results carefully using search/replace. Open-single-quote often not done.

Other

  • Fractions change to UTF characters (Latin-1 Supplement)
  • Tables - use ASCII Table Effects, keep within 72-char width, delineate cols with | .

Final save before Fork

  • SAVE the file --->File 4 AND the .bin file.
  • Compare with File 1 to keep track of changes made: Workbench | ppcomp. Document changes in TN.
  • Use File | Save a Copy to SAVE the file --->File 4.html.


+++++++++++++++++++++++++++++THE FORK++++++++++++++++++++++++++++++++++

Text file

General

  • Txt|Text conversion palette - determines how italics, bold, smallcaps, <tb> will be shown in the output utf8-text file.
  • Check & edit text file for readability.
  • Do indents and centering as necessary.

Rewrap and Clear Rewrap Markers

  • Save the file if any unsaved changes.
  • Use Edit>Select All then Selection>Rewrap Selection. Wait while rewrap completes.
  • Page through entire text, looking for improper indentation. If found, re-open, clicking NO when asked if you want to save the edits. Find and fix broken rewrap markups. Repeat this step.
  • Open Fixup>Footnote Fixup; tidy up footnotes. See this page.
  • Remove all rewrap markers: see this page.
  • Use Fixup>Remove End-of-line Spaces.
  • UTF8-txt file CR/LF issue: terminal | cd /.../dpcopy/... | todos -f -v ...utf8.txt; or do in Kate | Tools | End of Line | Windows-DOS | resave.
  • Save the document.

HTML file

  • Change -- to &mdash, and [oe] to ɶ.

Markup

  • Title, definitions and settings
  • Headings
  • Nowrap blocks
  • Blockquotes
  • Sidenotes
  • Footnotes (see Appendix)
  • Tables, inc. Contents. Add page links for Contents & List of Illustrations.
    • Conversion to table format:
    • S: \|[\ ]*([0-9])
    • R: \n<td class="tdr rpad3"> $1 (The rpad3 is adjusted to maintain alignment of table entries).
  • Index, inc. page links
    • S: ([0-9]{1,3})
    • R: <a href="#Page $1" title="Go to Page $1">$1</a>
  • Illustrations & captions
  • Abbreviations, foreign languages, emphasis, cite
  • Drop caps
  • Polish the TN

Checks & validations

  • View HTML in browser from Custom menu
  • Check, rinse & repeat
  • Run pphtml (Workbench) & CSS3 and HTML validations (from bookmarks).
  • Run Tidy (from Guiguts custom menu)
  • Look at html in Opera as well

Images

  • Open original images in GIMP and save there
  • Manipulate for quality & file size
  • Export As .png or .jpg to Images folder
  • Check appearance in browser

SAVE the document.--->FILE 5.html.

  • Has anything changed that needs changing in the text file as well?

EBookmaker

  • Zip the html file & the Images folder together
  • Run EBookmaker and check output .epub file (Epub Reader) & .mobi file (Calibre & Kindle). Change the file names to meaningful.

Final Checks vs PPV standards

Level 2 (major errors) - All Versions

  • Markup not handled (e.g. blockquotes, poetry indentation, or widespread failure to mark italics)
  • Poetry indentation does not match original
  • Footnotes/footnote markers missing or incorrectly placed
  • Printers' errors not addressed
  • Missing page(s) or substantial sections of missing text
  • Substantial rewrapping errors, e.g., poetry has been rewrapped or text version generally rewrapped so that it doesn't exceed 75 characters or fall below 55 characters (though the aim should be 72 characters) except where unavoidable, e.g., some tables
  • Widespread/general occurrences of hyphenated/non-hyphenated, spelling and punctuation variants and other inconsistencies not addressed (may be addressed by note in the TN)
  • Other major errors that could seriously impact the readability of the book or that represent major inconsistencies between the text and the HTML versions

Level 2 Errors - HTML version only

  • The W3C Markup Validation Service generates errors or warning messages (Please enter number of errors)
  • The W3C CSS Validation Service generates errors or warning messages other than for the dropcap "transparent" element (Please enter number of errors). Certain errors can generate other errors that will be automatically corrected when the original errors are fixed. Therefore, to count the number of real errors, simply run the Validator and count the errors that follow the message that includes "start tag was here". That will give you the real errors to enter into the PPV Form.
  • Non-working links within HTML or to images. (Either broken or link to wrong place/file)
  • File and folder names not in lowercase or contain spaces, images not in "images" folder, etc.
  • Cover image has not been included and/or has not been coded for e-reader use. (The cover should meet current DP guidelines.)
  • Project not presentable/useable when put through eBookmaker (Please see the section below on Checking E-reader Versions)
  • Heading elements used for things that are not headings and failure to use hierarchical headings for book, chapter and section headings (single h1, appropriate h2s and h3s etc.)

Level 1 (minor errors) - All Versions

  • Spellcheck/scanno errors
  • Gutcheck-type errors, e.g., punctuation, hyphen/emdash, missing/extra space, line length, illegal characters, etc.
  • Jeebies errors (English only)
  • Paragraph breaks missing or incorrectly added
  • A few occurrences of hyphenated/non-hyphenated, spelling and punctuation variants and other inconsistencies not addressed (may be addressed by note in the TN)
  • Chapter and other headings inconsistently spaced, aligned, capitalized or punctuated
  • Formatting inconsistencies (e.g., in margins, blanks lines etc.)
  • Other minor errors (such as a minor rewrap error, misplaced entry in the TN, or minor inconsistency between the text and HTML versions)

Level 1 Errors - HTML Version Only

Images

  • Unused files in images folder (other than Thumbs.db)
  • Appropriate image size not used for inline and linked-to images. Image sizes should not normally exceed the limits described here, but exceptions may be made if warranted by the type of image or book (provided the PPer explains the exception).
  • Images with major blemishes, uncorrected rotation/distortion or without appropriate cropping.
  • Failure to enter image size appropriately via HTML attribute or CSS such that the image is distorted in HTML, epub or mobi.
  • Failure to use appropriate "alt" tags for images that have no caption or to include empty "alt" tags for purely decorative images or if captions exist that would make an "alt" redundant.

HTML Code

  • Use of px sizing units for items other than images and borders
  • <title> missing or incorrectly worded (Should be <title>The Project Gutenberg eBook of Alice's Adventures in Wonderland, by Lewis Carroll</title> or <title>Alice's Adventures in Wonderland, by Lewis Carroll—A Project Gutenberg eBook</title>). For the dash separating "A Project Gutenberg eBook" from the book title, the PPer may use two hyphens (--), a UTF-8 em dash (—), or an HTML em dash (&mdash;).
  • Use of <pre> tags instead of their CSS equivalents
  • Failure to place <html>, <body>, <head>, </head ></body>, and </html> tags each on its own line and correctly use them. (This is required by the WWers)
  • Use of tables for things that are not tables
  • Used CSS other than CSS 2.1 or below (except for the dropcap "transparent" element and CSS 3 code that has been marked as "completed work," and has "REC" (for Recommendation) status according to the W3 specifications. As with dropcaps, it is necessary that volunteers being PPVed provide a note to the PPVer concerning what CSS3 has been included. (Note: This change to accept CSS3 code other than for dropcaps is a very new policy -- August 2017 -- and may be adjusted based on any issues experienced with submissions, so Project Gutenberg has asked DP volunteers to watch http://upload.pglaf.org/ for changes.)
  • Used HTML version other than XHTML 1.0 Strict or 1.1
  • Failure to use <div class="chapter"> or <div class= "section"> at chapter breaks to enable proper page breaks for e-readers (Please see here for more details). It is also acceptable to use <div class="chapter"> < /div> or <div class="section"> </div>
  • Minor HTML errors in code that do not generate an HTML validation alert (such as misspelling a language code)

Strongly Recommended

  • Enclose entire multi-part headings within the related heading tag
  • Avoid using empty tags (with &nbsp; entities) or <br /> elements for vertical spacing. e.g. <p><br /><br /></p> (or with nbsps) -- <td>&nbsp;</td> is still acceptable though
  • List Tags should be used for lists (e.g. a normal Index). For further information please read W3's List Use section
  • Include all text as text, not just as images
  • Keep your code line lengths reasonable
  • Tables display left, right, and center justification and top and bottom align appropriately
  • Tables contain <th> elements for headings
  • Remove thumbs.db file from the images folder
  • E-reader version, although without major flaws, should also look as good as possible

Mildly Recommended

  • Distinguish between purely decorative italics/bold/gesperrt and semantic uses of them
  • Include space before the slash in self-closing tags, e.g. <br />
  • Ensure that there are no unused elements in the css (other than the base HTML headings)

Uploads

  • Change file permissions for everything to be uploaded in Files - Others read/write access
  • Smooth reading: html, utf8-txt, epub & mobi files & Images folder.
  • PPV: html, html.bin, utf8-txt, utf8-txt.bin, & Images folder.

Appendix - Footnotes

  • First Pass "Fixup" -> "Footnote Fixup" - "First Pass" button. Guiguts moves thru identifying anything it thinks is a Footnote and its associated anchor. The pass will finish with the first footnote anchor highlighted in orange, the actual associated footnote in green. At the top of the Footnote pop-up, it shows how many footnotes are in the text. It may help you to select "Unlimited Anchor Search" if your footnote markers are a long way away from the actual footnote text.
  • Check Footnotes Hit the "Check Footnotes" button. Pop-up lists all footnotes found. Footnotes marked in yellow have duplicate anchors, footnotes marked in pink have no anchors. Use the "Go to - #" footnote dropdown in the Footnote tool to fix any errors. & re-run First Pass & Check Footnotes to make sure all errors have been fixed.
  • Check Footnote Count "Fixup" -> "Run Word Frequency Routine" -> "Sort Alpha" -> "Re Run". Look for how many times the word "Footnote" occurs in the text, it should match the number of Footnotes found by the Footnote tool, if it doesn't you have a missing Footnote somewhere, use the features of Word Frequency lists mentioned in 2.6.3 to fix any errors. Run the Footnote "First Pass" step if you fix any errors.
  • Step through all Footnotes "Next FN -->" & "<-- Last FN" buttons to step your way through the Footnotes guiguts has found, fixing any errors and joining any multi-page footnotes. You can use the "See Anchor" & "See Footnote" buttons to make sure that the anchor is associated with the correct footnote text. If you fix a missing ending bracket problem, hit the "Adjust Bounds" button and guiguts will find the new closing bracket. For each footnote, select the appropriate symbol style to use with the "Number", "Letter" or "Roman" buttons. You can use the "IMAGE" button to check the original symbol style if needed. Numbers are recommended if you have large numbers of footnotes. Don't worry about removing duplicate footnote symbols, this will be handled auto-magically in the next step. Guiguts depends on Footnotes being in the right format, [?] for markers and [Footnote ?: text] for the footnote. Make sure you correct all footnotes to this format. You can set manual anchors if you have lots of footnotes in [Footnote: ? text] format, see the guiguts manual on how to handle those.
  • Index Footnotes Once you have stepped through and checked, adjusted, fixed and anchored all of your footnotes, hit the "Re Index" button. For out-of-line footnotes, this will renumber all of the footnotes using the same family of symbol that it had originally or a number if it had no anchor marker. This will close up any gaps in the numbers and remove duplicates. You can make changes and re index as often as you like.
  • Move to Landing Zones Once you have set landing zones for all your footnotes, hit "Move Footnotes To Landing Zone(s)" & your footnotes will be moved. You can now re-run the "First Pass" step using "Last FN" & "Next FN" buttons with the "See Anchor" & "See Footnote" buttons to make sure all are correct.

Related Pages