User:LouiseP
Louise's Post Processing Check List
- Open text file with Guiguts and save as new working name.
- Set Page Numbers.
- As I am doing this step, I check through the page images in an image viewer to make sure nothing is missing.
- Search for "[*" to check all proofer comments.
- Check against image, and reformat note to standard format (for automated correction insertion--see Regexes), or change, or note for checking later, in a text file.
- I have line numbers turned on in Guiguts, and each time I find a [** comment], I copy the whole line, adding a note of the line number and original book's page number and correction, or issue to be resolved later. This becomes a complete record of everything I needed to check and what I did, and forms the basis of my Transcriber's Note (TN).
- The text file will most likely be added to later during spell checks, hyphen checks, etc.
- Footnotes [Use GG tool].
- Check correctly formed i.e. [Footnote X: ].
- Join any that are split.
- Renumber and move to end of section/chapter.
- Illustrations.
- Check correctly formed i.e. [Illustration: ]
- Move to beginning/end of paragraph if needed.
- This can be done at the same time as ...
- Remove page separators.
- Check for broken paragraphs, spacing of chapter/section headings, etc. as you go.
- Run word frequency to check hyphenated words, capitalisation, etc. Keep this open to use while you ...
- Search for "-*" (and variations such as "*-") and correct hypenation.
- Spell check.
- Always do after WF, or you won't get word counts.
- You can copy "good words" into top of text file and add to dictionary first - Remember to delete them when done!
- Afterwards, review custom.dic file - this sometimes shows proper names with variant spellings.
- Run Fix up, Gut Check, Jeebies and Scanno checks.
- Fix formatting issues. (Keep html markup for now.)
- No wrap / poetry.
- Blockquote.
- Italics.
- Small caps.
- Font changes.
- Bold.
- Poetry indents.
- Spacing between initials/abbreviations.
- Title Page / TOC / LOI / Table formatting.
- Spacing between chapters and sections.
- Add Transcriber's notes and spell check added section. SAVE.
- REWRAP.
- Check that the rewrap worked OK.
- SAVE a copy of this version with .html extension.
- Remember to re-open the .txt version before you start ...
- Format conversions: check after each one before saving!
- Convert <sc> to ALLCAPS.
- Convert <i> to underscore.
- Convert <b>, <f>, etc. to other distinguishing mark-up (and add note to TN).
- Run regex search and replace for typo corrections.
- "Tidy" Footnotes.
- Check and adjust spacing of title page, TOC, LOI, tables etc. after mark up removed.
- Final rewrap, remove nowrap markers and check nothing messed up!
- Run Gutcheck one last time (mostly for line length, EOL space etc.) [And /or check with pptxt / ppspell]
- If desired, create a UTF-8 version of the text file (e.g. for Greek, curly quotes, [oe] ligatures).
- Prepare any images needed for html. Ensure file names are lower case letters and numbers only.
- Open html start file.
- Convert nowrap markup to /p p/, /X X/ etc. as needed.
- Auto generate html.
- Define any needed CSS classes for formatting.
- If you have CSS classes you use frequently, add them to the Guiguts "header.txt" file so they will be included automatically.
- Check content of Title tag; insert link to coverpage image if needed; Format title pages.
- Apply formatting, including replacing any formatting written by GG as in-line style e.g. text indents, Headings, etc.
- Replace [oe] with OE ligatures; any other chars that did not autoconvert e.g. Greek.
- Delete auto-generated TOC if not needed; convert TOC, LOI and other tables to html tables or lists if needed.
- Review automatic footnotes to make sure they converted OK (someties names get screwed up).
- Insert illustrations.
- If you use the GG automated illo tool, beware that it removes formatting from the default "Alt" tag then uses that to create the caption, so you will lose italics, small caps, etc. A Regex may work better.
- Review Page number locations.
- Add links to page numbers etc. where needed in index, TOC, LOI, TNs, internal cross references, etc.
- If desired, add <ins> tags to display pop-ups for corrections. See regex search and replace for this below.
- FINAL CHECKS
- Check html with tidy and link checker.
- Validate html and css at W3C.
- Check text and html files using rfrank's wpptxt, wpphtml and wppspell tools.
- Test in various browsers, including with CSS off, images off, etc.
- zip text and html files and images folder. Ensure no extra files or folders are included.
- If uploading for PPV, include .dic and .bin files.
- Upload to PG or PPV.
My Regexes
These regexes are for Guiguts - they may not work in other text processing tools.
Small Caps
Change <sc> marked up text to ALL CAPS:
- Search for: <sc>([\w\W\n]+?)</sc>
- Replace With: \U$1\E
Change ALL CAPS to small caps:
- Search: ((\b(\p{IsUpper}+\W?\s?\n?)\b){2,})
- Replace: <span class="smcaps">\T$1\E</span> (for title case)
- Replace: <span class="smcaps">\U$1\E</span> (for upper case)
- Replace: <span class="smcaps">\L$1\E</span> (for lower case)
Italics or other markup
For other markup, e.g. bold, replace <i> and _ in the following examples with appropriate markup and representation.
Change italics to underscore:
- Search: <i>([\w\W\n]+?)</i>
- Replace: _$1_
Note, latest version of Guiguts handles converting from italics to underscore automagically, however it can be useful to do this on smaller sections of text.
Change underscore to italics: (For when you messed up and forgot to save an html file before doing the above!)
- Search: _(.+?\n?)_
- Replace: <i>(.+?\n?)</i>
Replace plain italic mark up with semantic html markup:
- Search: <i>([\w\W\n]+?)</i>
- Replace: <em>$1</em> for emphasis
- Replace: <cite>$1</cite> for citation
- Replace: <em lang="lat" xml:lang="lat">$1</em> for emphasis and mark as another language
Page Links
To insert page links around page numbers in the text:
- Search: (\d+)
- Replace: <a href="#Page_$1">$1</a>
Note that this will change ANY number - so you need to step through the search and replace, do NOT use with replace all! And of course, you need to have the correct anchors in place for these links to go to. You may need to adjust to your page number anchor format.
Find White Space
Works with search in forward direction only. Example finds stretch of whitespace between 3 and 72 characters wide, which are then replaced with correct no of spaces.
- Search: \s{3,72}
- Replace: [correct number of spaces]
Find Repeated Words
- Search: \b(\S+)\s\1\b
Curly Quotes
HINT: do this before automatic HTML conversion! (Otherwise you will get lot's of incorrect hits).
Double quotes are usually pretty easy....
- Search: "([\w\W\n]+?)"
- Replace: “$1”
Single quotes are hard, as you need to replace the apostrophes...
- Search: '
- Replace: ’
And then replace single quotes with:
- Search: '([\w\W\n]+?)'
- Replace: ‘$1’
Corrections
I have not worked out yet how to automatically increment the numbers yet or to automatically create the TN, but these regexes make coding the html very simple. Just make sure your corrections are all correctly marked as tpyo[**typo]. In the Text file, the simple replace functions will "clean up" after you have split off from the master file.
Find whole line with correction: ^([\w\W]+?)\[\*\*([\w\W]+?)\]([\w\W]+?)$ I copy each line to a text file then edit to make the TN.
Insert correction mark up in html:
FIRST: Repeated Word Search: \b(\S+)\s\1\b\[\*\*(\1)\] Replace: <a name="corr_10" id="corr_10"></a><ins class="mycorr" title="Original: $1 $1">$1</ins> {in html} Replace: $1 {in text}
THEN txet[**text] Search: ([\S]{1,70}?)\[\*\*([\w\W]+?)\] Replace: <a name="corr_10" id="corr_10"></a><ins class="mycorr" title="Original: $1">$2</ins> {in html} Replace: $2 {in text}
My CSS Cookbook
Created because the good stuff on the Wiki has gone away!
Drop Capitals
This is the simple version that just gives you a big letter:
CSS: .dropcap {float: left; padding-right: 3px; font-size: 250%; line-height: 83%;}
HTML: <p><span class="dropcap">U</span>pon</p>
This is the complex version that has a decorative image:
CSS: .hide {display:none;} img.cap {float:left ; padding-right:10px; padding-bottom:8px;}
HTML: <p><img src="images/c.png" class="cap" width="100" height="102" alt="C" title="" /><span class="hide">C</span><span style="margin-left:-10px;">Ourteous</span> Reader, doe you not wonder?</p>