Guiguts Tutorial

From DPWiki
Jump to: navigation, search

Note: The comment below is currently 7 years old (in 2014), and you definitely shouldn't try to learn PPing with Guiguts from this document. I don't believe it is being actively maintained.

Note: This Tutorial is currently outdated by two years and you probably shouldn't try to learn PPing with Guiguts from it quite yet. We're working on it.


Contents

Text of PP Guiguts tutorial - FEEDBACK PLEASE - 27/3/05

Hi All,

The following is the text of the PP tutorial I've been munging. I need your help to improve it, clarify it & add all the steps I missed. Please add to this thread with your thoughts.

Jobs I still need to do:

A) Add HTML instructions (sort of done) C) Add special sections for special texts like Poetry & Drama. guiguts has several features for making these texts easier.

If you want to go through the tutorial with me, just let me know. I "fed" most of it to tterrag, step-by-step, while I was proofing yesterday without much hassle. tterrag was good enough to take on his first PP project & to help me with a PP beginner's perspective of the tutorial.

Despite the length of the tutorial it took tterrag about 2 hours to get through most of the steps required to generate an ASCII version.

PPing looks scarey from the outside, but jump in the water's quite ok! & there's lots of help available.

DPWiki will be taking care of the tutorial if you lot think it's good idea to continue in this vein.

Thanks, Pourlean


This Tutorial is still largely based on the good work done by Pourlean, but has been heavily edited to fit into the wiki and update it a little by (Jasper). Where it says "I" it still refers to Pourlean, if the text means myself It will be signed like so: (Jasper)

Guiguts High School

This wasn't even relevant in the forum version...

 Hi All, Welcome to "Class" - A few things before we get started.
 1. Jabber/Internet connections can be flakey - apologies in advance for any interruptions.
 2. I'd really like feedback on whether you think this idea is great or sucks & any suggestions you may have - I'll ask again later.
 3. This session is being logged so we can generate a more permanent version of the tutorial.
 4. At various stages I will ask you to do something in guiguts - please respond saying you have successfully completed this stage with "thumbs up" (y) or "thumbs down" (n)
 5. No talking in "class" - but please ask questions, I'm particularly keen to know about order of operations issues & other ways of doing things.
 6. There is no point 6. Smile
 7. This is going to be a long tutorial & please say if it's not useful & we can change the style. I want to see how this goes & DPwiki/HTML the result.
 8. Take some time after class to digest the guiguts manual - there's lot of extra tips in there!

Onwards... This is a run through of how I PP a basic text using guiguts.

When you actually have to *do* something it's marked

  • like this,

otherwise it's usually just background info.

Pre-requisites.

This tutorial assumes you have:

  • Read the PP FAQ.
  • Followed these instructions.
    Thus you have the latest Guiguts, Aspell and an image viewer (possibly along with optional Jeebies and Tidy) installed and ready to use. Be sure to set up path preferences for the helper programs, and add DPCustomMono font, which helps to spot errors in text.
  • Checked out a Post-Processing Project and have downloaded the text and preferably the images as well. You have unzipped the project text into a folder, e.g. C:\DP\fred, the project images are unzipped into a pngs folder of the project folder, e.g. C:\DP\fred\pngs

WARNING WARNING WARNING. Make sure you use short directory names with no spaces in the file or directory names when setting up projects for Guiguts. [**is this still relevant in the current version?]

Setting up a project.

  • Please go to your winguts directory & start Guiguts - winguts.exe. If you are running the perl version - guiguts directory & guiguts.pl.

You will see a blank window with "Guiguts .VERSION - No File Loaded" in the Title Bar & lots of interesting looking menus.

We're going to load a PP project now so:

  • "File" -> "Open" & use the Browser to find your PP project .txt file.

Checking the Images Directory

This will change between projects but is automatically set when you load a file, to the pngs directory where you opened the project text file. e.g. I have D:\DP\Project\project.txt loaded - guiguts will look for images in D:\DP\Project\pngs

  • "Prefs" -> "Set File Paths" -> "Set Images Directory"

You should see png.001 displayed on your screen by your image viewer.

Set Gutcheck Options

  • "Fixup" -> "Gutcheck Options" - > Check "-v Enable verbose mode", "-p Report ALL unbalanced double quotes" & "-s Report ALL unbalanced single quotes."

Page Markers

Now one of the most useful Guiguts features... Click in the text & you will see the bottom of the Guiguts window change. You can now use the IMAGE button to display the image for the page you are viewing in Guiguts.

Take a minute to scroll through the txt file, then click in the Text pane - you'll see the Page Number update on the bottom of the Guiguts window. When you click "See Image" the image viewer will display the appropriate page.

I spent a bit of time setting the geometry of both Guiguts & the image viewer so I could view both the Guiguts window and the image file at the same time without window overlap. You can experiment with this.

Tear-off Menus and Pop-ups

The tutorial doesn't make mention of closing tear-off menus and pop-ups. Close them if your screen gets too cluttered. You will quickly sort out where to move pop-ups to suit your working conditions.

Saving your work

The tutorial doesn't prescribe when to save your work. This is up to you. A save before removing page separators or moving footnotes is recommended. "File" -> "Save" or use the CTRL-S hotkey.

Sometimes you will see:

 You might wish to save your work here, and 
 then do Save As with a new name so you'll 
 still have the old file if things go badly.

at points where it might be beneficial to do so, but this is purely advisory.

Note: Guiguts saves important information about your project in a .bin file in the same directory where your text file is located. For example the .bin file tracks where page breaks are located even when the page separators have been removed. A project called D:\dp\foo.txt will have an associated D:\dp\foo.bin file. DO NOT REMOVE OR EDIT THE .BIN FILE.

The actual PPing...

Do some research.

  • Read the Project Comments & Project Comments & Questions Forum thread for your PP project. If the proofers found anything of concern, make a note of it for special attention while processing the text. You will also need to make sure you follow the Project Managers instructions for the text. Many request HTML versions for texts.

Removing blank lines

Removing blank lines before Page Separators makes the Page Separator removal process later on easier.

  • "Fixup" -> "Remove blank lines before Page Separators"

Checking Mark-ups for Poetry—(/* */) or (/P P/) or (/p p/), blockquotes—(/# #/) & tables—(/$ $/).

We need to do this so other routines don't muck up the indentation and formatting of the text.

Check Mark-ups for Poetry automatically

First we need to check that /* */, /$ $/, /p p/ & /# #/ markups match & are correct. Mark-ups should be on lines by themselves and separated from the main text by a blank line. e.g. The /* markup should be on a line by itself and there must be a blank line before the markup. The */ markup should be on a line by itself and there must be a blank line after the markup.

  • Hit the dotted line at the top of the Search Menu & the Search Menu list will become a tear-off menu, like a pop-up

While you're running these checks, any continuation markers */*, #/# can be removed. The proofer who is leaving them in the text should be sent feedback not to use them in future projects.

  • "Find Orphaned Brackets & Markup"
    • Click the "/* */" & Hit "Search", correcting any mistakes the search routine find.
    • Work through the other markup symbols.

  1. "Find next /* */ block"
    • work through correcting any mistakes. Use the "See Image" button to view the image for any poetry and check that the indenting is correct.
  2. "Find next /# #/ block"
    • work through correcting any mistakes. Use the "See Image" button to view the image for any poetry and check that the indenting is correct.
  3. "Find next /$ $/ block"
    • work through correcting any mistakes. Use the "See Image" button to view the image for any poetry and check that the indenting is correct.
  4. "Find next /p p/ block"
    • work through correcting any mistakes. Use the "See Image" button to view the image for any poetry and check that the indenting is correct.
  5. "Find next indented block"
    • work through finding any indented text which may need to be wrapped in /* */, /$ $/, /p p/ or /# #/ markers.
  6. "Search & Replace" -> "Search Text" = * -> Uncheck "Whole Word Only" & Hit "Search"
    • work through the text looking for any broken /* markups, leave all other asterisks, they'll be fixed up later. This is when (if you have a very old project) you come to loathe the "* * * * *" thought break markup & are truly grateful it was changed to <tb>.
  7. "Search & Replace" -> "Search Text" = # -> Uncheck "Whole Word Only" & Hit "Search"
    • work through the text looking for any broken /# markups.

Advanced Tip - guiguts allows you to set indenting options. See the guiguts manual for more information.

Preliminary Footnote Checking.

If you do not have Footnotes - move along...

Manual Runthrough

 You might wish to save your work here, and 
 then do Save As with a new name so you'll 
 still have the old file if things go badly.

Now page through the text and the images & look for various things:

For XnView users, you can use the PgDn/PgUp keys to move between image pages starting at the first page. This also works in Irfanview, although it also works with left/right cursor.

Marker Rewrap? Indent? HTML
No special markup, the default yes no
/* */: poetry, etc. No rewrap Fixed Preserves alignment; see here
/$ $/: tables, etc. No rewrap No indent Preserves alignment; see here
/# #/: block quotes, etc. Rewrap & indent: Yes, to
Block Rewrap margins.
As block quote; see here
/p p/: poetry, etc. No rewrap. 4 spaces As poetry; see here
/f f/: front matter No rewrap No indent Centered para; see here
/L L/: bulleted list No rewrap Fixed Unsigned list; see here
/X X/: later manual HTML No rewrap No indent Only <pre>, see here
  • look for anything which should be in /* */, /$ $/, /P P/ or /# #/ which the proofers forgot to mark-up and add the mark-up symbols. It helps to compare the text with the image for this task.
    Use /# #/ block quote markers for quoted text and indented text like letters. This text will be indented and rewrapped as well as being checked by the Fixup Routine.
    Use /$ $/ for text which does not require indenting, but does need formatting preserved, e.g. indexes & tables.
    Use /* */ for Poetry.
    Use /P P/ for indented poetry.
    For a detailed look click here.
  • At this stage you should also look for missing italics, chapter headings and other markups, such as transliterations and illustrations. Also check that the text around page breaks makes sense & that there are no text/image mismatches. If you find a text/image mismatch, use the Project Details feature of the Post Processing Page to recover the page text from Round 1 or the OCR output and re-proof the page.
  • I also move Illustration tags & tables to paragraph breaks, check that Footnotes look ok & remove [Blank Page] markers after checking the page is indeed blank.
  • If you find a missing page, mark the place in the test with a [**Missing page] so you can find it later on & contact the PM. If you don't receive a response from the PM - use the Missing Pages Wiki. I normally stop when I find a missing page & move onto another project.

Poetry markup for easier HTML generation.

Guiguts now has a poetry specific markup /P P/. This makes HTML generation much easier. If you would like to use this feature, go through & change any poetry markers to /P P/. To do this easily: "Search" -> "Search & Replace", uncheck all check boxes -> "Search Text" = /* -> "Replacement Text" = /P Change a few manually with "Replace", then "Search" to make sure you're happy with the results & then Hit "Replace All" Do the same with */ as the Search term & P/ as the Replacement Text.

Running the fix-up Routine

Everything in /* */, /$ $/, /P & P/ & /# #/ should now be protected from the fixup routine.

 You might wish to save your work here, and 
 then do Save As with a new name so you'll 
 still have the old file if things go badly.
  • "Fixup" -> "Run Fixup"
    You will see a list of checkboxes with options - leave them all selected.
    Hit "Go" & wait

You'll see Guiguts working its way through the file making changes. Depending on how long your project is & the size of your CPU - this step can take a minute or so. You can run the fixup routine as many times as you like. If you have lots of text surrounded in /* */ & /$ $/ markers, you can uncheck the Fixup options which collapse spaces & run the Fixup routine on all text. Be careful of any Fixup options which destroy formatting. See the Guiguts manual for more information.

Removing Page Separators

Now that guiguts "knows" where the page separators are - they can be removed.

 You might wish to save your work here, and 
 then do Save As with a new name so you'll 
 still have the old file if things go badly.
  • "Fixup" -> "Fix Page Separators" -> "Refresh"

You should be taken to the first page separator & it will be highlighted in Halloween orange.

As you have checked text around page separators already, we can use the "Full-Auto" option. Check "Full-Auto" Guiguts will will prompt you for a manual fix for any page separator which has an ambiguous fix. You need to choose what you want to do with the page separator. The options & their hotkeys are:

  1. "Join Lines" - Hotkey "j" - Dehyphenate across a page break. Removes the hyphen, spaces & *'s as well.
  2. "Join, Keep Hyphen" - Hotkey "k" - Join lines across a page break, keeping the hyphen. Removes spaces & *'s as well.
  3. "Blank Line" - Hotkey "l" - Remove all blank lines before & after the page separator & leave one blank line.
  4. "New Chapter" - Hotkey "h" - Remove all blank lines before & after the page separator & leave 4 blank lines.
  5. "Delete" - Hotkey "d" - Delete the page separator, doing nothing else.
  6. "Refresh" - Hotkey "r" - Finds the next page separator.
  7. "Undo" - Hotkey "u" - Just what it says.

Use the Hotkeys to move through the file, deciding what to do at each ambiguous separator. Use the "See Image" button to display any page image where you are unsure about how the page separator should be handled.

Cleaning up the Table of Contents, Lists of Illustrations & Title Page.

Now we need to fix the Title page, TOC, and Illustration listings.

Edit the TOC, etc. in the guiguts window.

  • Surround any text you don't want re-wrapped with /* */ or /# #/ markers. Also use /f f/ markers for Front Matter.

Check that all the Chapter markers exist and match the TOC.

  • Use the "Search" -> "Search & Replace" feature to look for Chapter markers as written in the TOC. e.g. roman numerals.

Make sure there are Illustration tags for all illustrations.

  • Use the "Search" -> "Search & Replace" feature to look for Illustrations.
    Compare any lists of illustrations at the start of the book from the page images with the illustration markers you find. If you find a missing illustration, contact the PM.

Fixing errors left in the text.

This is also a very important step in the process, consisting of a lot of substeps:

Resolve *'s left by proofers.

  • "Search" -> "Search & Replace" -> Uncheck "Whole Word Only" & -> "Search Text" = *
    Use the "Search" button to find *'s in the text. Use the "See Image" button to compare the text to the image. If there are any problems with *'s you cannot resolve due to poor scans, contact the Project Manager to see if they still have access to the original text.
  • When you find a hyphenated word with a *, try search-replace to see if the text uses the version with or without the hyphen elsewhere in the text.
  • This is also where you decide what to do with obvious typos in the scanned image, remarked upon by proofers. Some PPs prefer to correct the errors in the text and have a 'corrected typos' section showing what they've changed, others prefer to leave the text as-typeset and have a 'known errata' section. In either case, mention the page number in the list -- even in the Ascii version, it will give readers a relative position to roughly orient themselves, and in the HTML version you can make the page numbers links to that point in the text.

Finding Orphaned Brackets & Markup

  • "Search" -> "Find Orphaned Brackets & Markup"
    Run this for all the types. This check has been done before for /* markup, but run it again just to make sure all is ok after page separator removal.

For HTML markups

  • "Fixup" -> "HTML Fixup" -> "Find orphaned Markup"
    This finds missing <i> <b> etc. Depending on how broken the markup markers are you may have to use the Search & Replace function to look for "<". More on Search & Replace later. You can also use the Character Count feature of the Word Frequency function to see numbers of <'s & >'s. They should match. More on that later..

Word frequency lists

  • "Fixup" -> "Run Word Frequency Routine"
    All these functions generate lists of matches. If you highlight a word in the list, then double left-click - you will be taken to the first occurence of that word in the file for easy fixing. Double left-clicking on a word/character in the frequency list multiple times will cycle through the file finding occurences.

If you right-click on the word you will launch the "Search & Replace" pop-up with the selected word already inserted into the Search field. You can then set the "Replace" field with your chosen text to fix muliple occurences of an error. You can also check the "Whole Word Only" & "Case Insensitive" boxes when searching to limit or expand your search. It is useful to use the "Re Run" feature between these steps. Warning the "Re-run" feature automatically will save your file before the re-run of the Word Frequency routine.

  1. Hit "Emdashes"
    You will see a list of words that contain a hyphen, all words that are identical except that they DON'T have a hyphen and the words that are identical except that they contain an emdash (two hyphens). Fix any words which are marked with em-dashes, but should just be hyphens. This check will not find all occurences of this error as it doesn't list all em-dashes. See the manual for more details.
  2. Select "Alph"(abetical sort) & then Hit "Hyphens"
    You will see a list of hyphenated words from the project in alphabetical order. The number to the left represents how many times the word appears. If you scroll down the list you'll see non-hyphenated versions of the same word marked with *, resolve any conflicts by looking at page image, hit the 'See Image' button, if required.
    Note: many older texts have hyphenated words we wouldn't use today - preserve the original hyphenation as seen in the image.
  3. Hit "Alpha/num"
    You will see a list of Alphanumeric words which appear in the text. Fix any words which have "ones" lurking as "ells" for example. Check consistency of dates.
  4. Hit "Spelling"
    You'll see a list of words not in the dictionary in Alpha order which occur in the text. This is a useful check for resolving inconsistencies with proper names.
    Work through the list fixing any errors. This is just an initial check. A full spell check will be run shortly.
  5. Hit "Ital/Bold Words"
    You'll see a list of words and phrases which appear in & markers in the text & those which do not. This is a useful check for resolving inconsistencies with italics markers for abbreviations & Journal titles..
    Work through the list fixing any errors.
  6. Hit "ALL CAPS"
    You'll see a list of words which appear in all caps. This is useful for checking you have all the CHAPTER headings.
    Work through the list fixing any errors.
    If you need to change the case of any text, just select it & use "Selection" -> "UPPERCASE Selection". The text will change to ALL CAPS. Other case changing features which are available from the "Selection" menu are "lowercase Selection", "Sentence case selection" and "Title Case Selection."
  7. Hit "MiXeD CasE"
    You'll see a list of words which appear in mixed case in the text. This is useful for checking you have all the Chapter headings.
    Work through the list fixing any errors.
  8. Hit "Initial Caps"
    You'll see a list of words which appear in Initial Caps in the text. This is useful for checking Proper Names.
    Work through the list fixing any errors.
  9. Hit "Character Cnts"
    You'll see a list of all the characters in Alpha order in the text. This is useful for weird characters & spotting missing brackets, (, ) counts should be the same for example. Mark-up mismatches for [] & <> are also easy to spot here too.
    Work through the list fixing any errors.
  10. Hit "Check , Upper"
    You'll see a list of phrases which contain , Initial. This is useful for checking for , -> . scannos.
    Work through the list fixing any errors.
  11. Hit "Check . Lower"
    You'll see a list of phrases which contain . lower. This is useful for checking for . -> , scannos.
    Work through the list fixing any errors.
  12. Hit "Check Accents"
    You'll see a list words with accents in the text and words which probably should have accents. This is useful for resolving inconsistencies with accented characters. Use the "See Image" button to compare the text to the image to resolve inconsistencies. This check works like the hyphen check.
    The Latin-1 chart available from "Help" -> "Latin-1 Chart" pops up a window which allows you to click on accented characters to easily add them to the text file. Characters are inserted in the text file at the cursor.
    I also use the Search & Replace Popup at this stage to check for ae & oe ligatures. I add them back into the .txt if I find they have been removed by the proofing rounds. I proof the oe ligature as [oe] so I can subsititute it later on for a HTML entity in the HTML version.
    Work through the list fixing any errors.
  13. Hit "Stealtho Check" -> Use the browser to open misspelled.rc from your winguts/scannos directory
    You'll see a list of suspect words you can work your way through.
    There is an alternative way of running these scanno checks using the Search and Replace function. See 2.6.4 for more details.

Regex checking

Regular expressions are fantastic for pattern matching within text files. You can search for patterns using a slightly arcane syntax, don't fret if this is all a mystery. There is a Regular Expression clinic at the DP Forums, you can post saying "I want a regular expression which looks for a . followed by any whitespace, then a lower-case letter to find .'s lurking as , in a sentence." There's a regex which already does this in guiguts.

  • "Search" -> "Stealth Scannos" -> use the Browser to go to the scannos directory in your winguts or guiguts folder. Select regex.rc
    This fires up the Search & Replace pop-up window, with the regex button selected. Guiguts will highlight the first match for the regex in the text window. You can then decide if it's a real error and replace it using the "Replace" button. If it's an error but the replace expression is not right for this particular instance, just type the correction in the text window. The "Search" button will keep looking for instances of the regular expression in the file. The "Next Stealtho" button will take you to the next regex in the regex.rc file. You can also use the 'Auto-Advance' check to have guiguts automatically advance its way through the regexs. This will speed up your checks. Work your way through the regular expressions in the regex.rc file.

If you're just beginning with guiguts, use the supplied regex file. If you want to know more - please check out montanus' great post in the Regex Clinic.

If you develop a great new regex & would like to share it with others, please add it to the regex.rc file.

Guiguts now has an "Edit regex" function, you can add & edit your own regex's from guiguts. See the manual for more details.

You should now load the en-comm.rc scanno file into guiguts via "Search" -> "Stealth Scannos". Use auto-advance to work your way through these scanno checks.

You can also download additional (common, suspect and rare) scanno lists in .rc format in several languages.

Remove end-of-line spaces

Before we run gutcheck it helps to remove spaces at the end of the line which have appeared while you've been editing the file.

  • "Fixup" -> "Remove End-of-line Spaces"

Run Gutcheck

Run Gutcheck to fix common errors in the file, such as unbalanced quotes.

  • "Fixup" -> "Run Gutcheck" will automatically save your file & run Gutcheck. This may take a while on slow systems.
    In the gutcheck pop-up - "View Options" allows you to hide gutcheck messages you're not interested in. (At this stage it helps to check "Short line" & "Long line". See additional tips in Guiguts manual.)
    In the gutcheck error list - left clicking on an error takes you to the line in the file with the error, right-clicking on an error removes the error from the list. This is very handy for working through the list of errors.
  • Work through the Gutcheck error list, correcting as you go. Re-run Gutcheck as many times as you like.
  • You will also find that selecting paragraphs of text & using the "Search" -> "Highlight double quotes in selection" or "Highlight single quotes in selection" is very useful in tracking down mismatched & wrongspaced quotes.

Spell Checking

We should now do a more thorough spell check.

  • "Search" -> "Spell Check"
    A window will pop-up with the first word not in the dictionary. Work through the list of spelling errors, adding words to your project or main dictionary if you need to.
    You may change the main dictionary if necessary (usually to check for British vs. American spellings).

For details on the options related to spellchecking see Guiguts manual.

Check transliterations

If you have any transliterations such as Greek, check the transliteration.

  • "Search" -> "Search & Replace" -> "Search Term" = "[Gr"
    Uncheck, "Case Insensitive", "Whole Word Only" & 'Regex"
  • Hit the "Search" button.
  • Work your way through any Greek transliterations you find. Use the "Help" -> "Greek Transliteration" chart to add any Greek characters to the text by clicking on a Greek letter in the chart. Text is inserted at the cursor in the text window.
  • You can change the "Search" term to look for languages such as Sanskrit & Hebrew.

Footnote handling

Guiguts has a powerful Footnote handling feature which takes the pain out of manually moving and renumbering footnotes.

  • "Fixup" -> "Footnote Fixup" opens a pop-up window with lots of buttons. Check the box for "Out-of-line" or "Inline" depending on the style used in the text. The rest of this tutorial will use out-of-line footnotes for examples as this is the guideline standard.
  1. First Pass
    Hit the "First Pass" button and guiguts will move through the file identifying anything it thinks is a Footnote and its associated anchor in the text. The pass will finish with the first footnote anchor highlighted in orange, the actual associated footnote in green. There is an indication at the top of the Footnote pop-up to show you how many footnotes are in the text. it may help you to select "Unlimited Anchor Search" if your footnote markers are a long way away from the actual footnote text.
  2. Check Footnotes
    Once you have run "First Pass" hit the "Check Footnotes" button. A window will pop-up a list of all the footnotes found. Footnotes marked in yellow have duplicate anchors, footnotes marked in pink have no anchors. Use the "Go to - #" footnote dropdown in the Footnote tool to fix any errors. & re-run First Pass & Check Footnotes to make sure all errors have been fixed.
  3. Check Footnote Count
    "Fixup" -> "Run Word Frequency Routine" -> "Sort Alpha" -> "Re Run"
    Look for how many times the word "Footnote" occurs in the text, it should match the number of Footnotes found by the Footnote tool, if it doesn't you have a missing Footnote somewhere, use the features of Word Frequency lists mentioned in 2.6.3 to fix any errors. Run the Footnote "First Pass" step if you fix any errors.
  4. Step through all Footnotes
    Use the "Next FN -->" & "<-- Last FN" buttons to step your way through the Footnotes guiguts has found, fixing any errors and joining any multi-page footnotes. You can use the "See Anchor" & "See Footnote" buttons to make sure that the anchor is associated with the correct footnote text. If you fix a missing ending bracket problem, hit the "Adjust Bounds" button and guiguts will find the new closing bracket. For each footnote, select the appropriate symbol style to use with the "Number", "Letter" or "Roman" buttons. You can use the "IMAGE" button to check the original symbol style if needed. Numbers are recommended if you have large numbers of footnotes. Don't worry about removing duplicate footnote symbols, this will be handled auto-magically in the next step.
    Guiguts depends on Footnotes being in the right format, [?] for markers and [Footnote ?: text] for the footnote. Make sure you correct all footnotes to this format. You can set manual anchors if you have lots of footnotes in [Footnote: ? text] format, see the guiguts manual on how to handle those.
  5. Index Footnotes
    Once you have stepped through and checked, adjusted, fixed and anchored all of your footnotes, hit the "Re Index" button. For out-of-line footnotes, this will renumber all of the footnotes using the same family of symbol that it had originally or a number if it had no anchor marker. This will close up any gaps in the numbers and remove duplicates. You can make changes and re index as often as you like.
  6. Setting Landing Zones for Footnotes
    Now you need to decide where you want out-of-line footnotes collected. End of text and end of Chapter footnote collection points, guiguts calls them landing zones, are automatically handled by the "Autoset End LZ" and "Autoset Chap. LZ" buttons. You can set additional manual landing zones if you want to, say at the end of a table for example. To set a manual landing zone. position the cursor in the text & hit "Set LZ @ Cursor". This will insert the marker text "FOOTNOTES:" at that point in the text, the footnotes between that landing zone and the previous one will be moved to just past that marker. You can have as many landing zones as you like, and can step through them adding and removing as necessary.
  7. Move the Footnotes to the Landing Zones.
    (You may want to save your file before this step) Once you have set landing zones for all your footnotes, hit "Move Footnotes To Landing Zone(s)" & your footnotes will be moved. You can now re-run the "First Pass" step using "Last FN" & "Next FN" buttons with the "See Anchor" & "See Footnote" buttons to make sure all are correct.

Rewrapping text

Now we're ready to re-wrap the text. You can set different margins for block quotes and normal text, any text within /* */ markers is left as is unless you specify an indent modifier with the /* marker. See the manual for more details on this feature.

 You might wish to save your work here, and 
 then do Save As with a new name so you'll 
 still have the old file if things go badly.
  1. Setting Re-wrap Margins.
    "Prefs" -> "Processing" -> "Set Rewrap Margins" shows you the defaults and allows you to change them for normal text (i.e. outside of /* */) & block quotes (i.e. inside /# #/)
  2. Re-wrapping text.
    To re-wrap all text:
    "Edit" -> "Select All"
    "Selection" -> "Rewrap Selection" will rewrap the entire text as per the preferred margins.
    You can use the "Interrupt Re-wrap" button if you wish.
  3. Visual check of re-wrap.
    Check that the resulting re-wraps & indents look OK. You can run re-wrap multiple times if you wish as the markers are not deleted in the re-wrap routine.
  4. Playing with re-wrap settings and indenting.
    Guiguts can do all manner of funky things with rewrap - See the guiguts "Help" -> "Open Manual" for more details. Handling for poetry indenting will be added as an extra section to this tutorial later.

Re-run gutcheck

  • Just to make sure gutcheck is happy with everything including the length of your lines, re-run it.
    Follow the directions as outlined in Run Gutcheck.

Final sanity/insanity check

  • Have a final read through the text & save it. It should be looking pretty polished by now & the ASCII version is pretty much done. Give yourself a huge pat on the back!

Generating an ASCII and HTML version.

The ASCII version

  • Save the file as projectname-ascii.txt. This will leave projectname.txt containing the mark-up symbols. You are now working on projectname-ascii.txt and this is the filename which should appear in the Guiguts title bar.

Removing mark-up symbols.

  • "Fixup" -> "Clean Up Re-wrap Markers"
    This routine will remove all LINES which contain /*, */, /# & #/ markers.

Change any italics markup symbols to underscores.

  • "Search" -> "Search & Replace", uncheck all check boxes ->
    "Search Text": <i>
    "Replacement Text": _
    Change a few manually with "Replace", then "Search" to make sure you're happy with the results & then Hit "Replace All"
  • Do the same with </i> as the Search term.

Change any thoughtbreaks to standard markup

  • "Search" -> "Search & Replace", uncheck all check boxes ->
    "Search Text": <tb>
    "Replacement Text":        *     *     *     *     *
    (seven spaces, asterisk, 5 spaces, asterisk, and then 3 more with 5 spaces distance each)
    Change a few manually with "Replace", then "Search" to make sure you're happy with the results & then Hit "Replace All"

Change any bold markup symbols to equals signs or a symbol of your choice.

The PG FAQ doesn't mention bold mark-up, so it's up to you how you handle bold text. Some people use equals signs or other symbols not already used in the text or convert the text to ALL CAPS. Others just remove bold mark-ups from the ASCII version, particularly if they are only used in headings, which are already differentiated from normal text by being in ALL CAPS or Title Case.

To use = as the bold mark-up symbol:

  • "Search" -> "Search & Replace", uncheck all check boxes ->
    "Search Text": <b>
    "Replacement Text": =
    Change a few manually with "Replace", then "Search" to make sure you're happy with the results & then Hit "Replace All"
  • Do the same with </b> as the Search term.

Change any other markup to proper text-version markup

  • Text in <g> gesperrt tags will need to be handled
  • Text in <f> font-change tags will need to be handled
  • Text in <u> underline tags if your PM asked the proofers to insert it will need to be handled.
    Which symbol, if any, or other markup, you use for these two types of tag is also not prescribed by the PG FAQ.
    If your project does not have Italics or bold in addition to these, you can use their _ or = symbol, or you can use * or ~ or any other character that seems appropriate.
  • Text in <sc> tags you will probably want to make all-caps (and remove the tags).
    Use the following search/replace terms:
    Search: <sc>(.+?\n?)(.+?)</sc>
    Replace: \U$1\U$2\E
    Check Regex
    This will replace the text from the starting tag to the end-tag with an all-caps version of the text between the tags, removing the tags in the process.Do a few manually to see if you approve and then hit replace all.

Re-run gutcheck

Just to make sure gutcheck is happy with everything including the length of your lines now you have removed mark-ups.

HTML version

  • Open projectname.txt file containing the mark-up symbols.
  • Save the file as projectname-htm.html. You are now working on projectname-htm.html, which is the name seen in the Guiguts title bar.
  • "Fixup" -> "HTML Fixup" -> "Autogenerate HTML"
  • Save and load projectname.html into a browser & fiddle with it until it looks ok. I am not a HTML expert, but I've done this a few times now & it isn't hard for simple texts. For an example of auto-generated guiguts HTML see this project.
  • Don't forget to prepare any illustrations and include them in the HTML version. For each image in the project:

I use Irfanview to crop & resize the image. I use 400 high or 400 width as the smallest dimension, e.g. 400x450 or 450x400 for thumbnails & use 1200 high or 1200 wide for a full-size image.

I number the illustration tags like so: [Illustration 1-3: The Emperors Rookery.] & the associated images are called 1-3.jpg & 1-3_th.jpg

I then use a regex to generate all the Illustration tags to:

  <p class="figcenter"><a href="./images/1-3.jpg">
  <img src="./images/1-3_th.jpg" alt="The Emperors Rookery" 
  title="The Emperors Rookery" /></a></p>
  <p class="figcenter"><span class="smcap">The Emperors Rookery</span></p>
  <hr />

Holler if you want the regex.

When done with all the images - I make a table TOC of the illustrations intergating links to the page where the images appear on. See indexing in the next section.

Inserting page anchors

  • When you get to the stage of generating HTML, check whether you have a consistent page offset between png number & page number on the image all the way through the book. e.g. If you have Page 11 in guiguts & page 1 on the page image, Page 20 in guiguts & page 10 on the page image Page 400 in guiguts & page 390 on the page image & there is a consistent 10 page difference in between, then you have a -10 page offset. Many books already have a consistent offset. This is nice to work with.
    You may not have a consistent offset, either due to unscanned pages or inconsistent numbering throughout the book. e.g. Books with full-page illustrations often have a blank page facing the illustration and include neither illustration nor blank in the numbering, so you could see 015.png=Page 11, but then 016.png=Illustration 1, 017.png=[Blank Page] and finally 018.png=Page 12. In this case, you would have a -4 page offset up to Page 11 and a -6 offset starting at Page 12. Offsets may change in this way many times throughout a book.
  • The Adjust Page Markers and Setting Page Labels tools will help you adjust .png numbers into Page anchors. To use the Page Marker tool, right click (or CTRL+click on a Mac) on the 'Img:' field of the bottom-of-window status bar. A popup appears & the locations of the page breaks show in yellow markers in the guiguts window at the ends of lines the text. Generally you will not need to adjust the locations of the page breaks.
    Nonetheless, I normally check a few throughout the book, particularly around plates, to make sure I haven't missed anything. This has also proved a useful double check of missing pages. I found a missing page in one of the last books I PPed (it made sense across the page break) at this stage.
  • To open the Page Labels tool, right click (or CTRL+click on a Mac) on the 'Lbl:' field of the bottom-of-window status bar. A popup appears showing the current mapping of .png numbers onto Page numbers. You can adjust and recalculate the numbering schemes, in Arabic or Roman numerals or a combination thereof.
  • Once you have finished this process you have the page-markers-in-guiguts matching the page-labels-printed for all markers.
    This post may provide clarification.

Linking to Page Markers from Indexes

  • When using the Autogenerate HTML function, be sure that the checkbox 'Insert Anchors at Pg #s' in the HTML popup is checked.
    Once the html is generated an anchor like <a name='Page_1'></a> is created at every page break matching the pages in the original book.
  • Then it is a simple matter of using search & replace to find page references to change them to the page anchors.
    I use something like (& yes that is a leading space before the (\d+)):
 Search= (\d+)
 Replace=<a href='#Page_$1'>$1</a>

This regex changes a space followed by any number of digits to an internal link to the page number. I make this change manually for the text, but automatically for the entire index. I sometimes have to look for: -(\d+) as well for ranges of numbers found in indexes.

It may sound complicated, but like most things, it takes longer to explain than actually do. Smile

You can also spice up your indexes using the CSS Cookbook.

An example.

Running Tidy

I do not use the output generated by Tidy, I only fix any errors or warnings in my version & then recheck that Tidy is happy with my version. Tidy output makes HTML very hard for humans to read - it is preferable that you do not submit tidy output!!!

Guiguts now has the ability to interface with a locally installed Tidy program in the same way that it works with gutcheck. - Download Tidy from here (or here, if you're reading this sometime after April 2008 and want to check for more recent versions). Install it & change 'Prefs' -> 'Set File Paths' -> 'Locate Tidy Executable' - you can then use 'Fixup' -> HTML Fixup' -> 'HTML Tidy' to check your HTML files. The Tidy interface work very much like the interface to Gutcheck - you are presented with a list of errors & warnings which you can click on be taken to the place in the file where the problem is.

Check all links

Use the guiguts internal linker for checking links as the validator and Tidy cannot check for bad links.

  • "Fixup" -> "HTML Fixup" -> "Link Checker"

Validating the HTML

Once you're happy with the HTML, run the project through the HTML validation service:

  • "External" -> "W3C Markup Validation Service" and follow the steps on the W3C web page. I found the HTML generated by guiguts passed this check with flying colours.
    However, you will sometimes see hundreds(!) of errors, often mostly of the same type. This is often related to a simple missing opening or closing tag directly before your first error of a type. One fix in the .html file, reuploaded and revalidated, may well fix 90% of the errors with which you are first presented.

Making the zip file

Once you have finished with the ascii & HTML versions,

  • zip up all the finished files (no spaces in filenames please!), typically the ascii txt, the HTML and any illustrations into a .zip file & upload the file for PPV using the PP page. Make sure the file extension is .zip, not .ZIP!

Where to get more help (apart from the Manual).

  • For guiguts additions/bugs

http://www.pgdp.net/phpBB2/viewtopic.php?t=3075 The incredible thundergnat is very responsive to bug reports & feature requests.

  • for this tutorial

Jabber groupchat pgdp@conference.jabber.org or PM one of the PP Mentors

Related Pages


Guiguts Tutorial/Old Changelog