PPTools/Guiguts/Guiguts Manual/HTML Menu

From DPWiki
Jump to navigation Jump to search


GUIGUTS VERSION 1.4.0 MANUAL

The HTML menu

Gg-1.4.0-46a-htmlmenu.png

contains most of the tools you will use in preparing the HTML/epub versions of a book, so you'll probably find it convenient to tear it off and place it on the screen:

Gg-1.4.0-46b-htmlmenu.png


Prepare the text for conversion to HTML

The descriptions here assume you've brought the text to the point where it is ready for conversion to HTML, including:

  • error correction, organization, consistency.
  • removing leading spaces from all lines except those in poetry.
  • confirming that the block-level markups are the ones you want to use. Guiguts recognizes markups that are not available to the Formatters (the first two on this list are the only ones the formatters can use for blocks). Every one of these opening block markups must have a matching closing markup, and the ones with letters may be upper- or lower-case:
    • /*  each line within the block will end with <br /> to preserve individual line breaks
    • /#  the block will be given a CSS class of "blockquot"
    • /$  is like /*, but the block will not be indented. This can be useful in marking tables in advance
    • /P  the block will be formatted as poetry, using the CSS declarations in header.txt
    • /C the block will be given a CSS class of "center" to center it within the current margins (likely narrower if within a Block Quote, for example). The lines will not be wrapped, because AutoGenerate will add a <br /> to the end of each line
    • /R the block will be shifted over to the current right margin using the CSS class "right" (likely narrower if within a Block Quote). The lines will not be wrapped, because Autogenerate will add a <br /> to the end of each line. Only the longest line in the block will be at the right margin, and relative indentation of the others will be approximately replicated by AutoGenerate. It will do this by adding <span style="margin-right: nnem;> (where nn is an appropriate numeric value) to all but the longest line. /R is mostly intended for use with correspondence, where the city/date lines at the top and the signature lines at the bottom are towards the right-hand side of the page. /R also can be useful in positioning the "credits" line just below some illustrations
    • /i  the block will be formatted as an index (that letter is 'eye', not 'one' or 'ell'). The HTML Auto Index option can do this, but using the markup is simpler
    • /F  the block will be formatted as front matter: centered; lines will not end with <br /> so text can reflow to fit the margins
    • /L  the block will be formatted as an unordered list, beginning with <ul> and with each line beginning with <li> and ending with </li>. The block will end with </ul>
    • /x  the block will be enclosed with <pre> and will be subject to limited formatting. At DP, the only normal use of this is with genealogy charts, unless you are using "HTML Auto Index" rather than /I.
  • "Autogenerate HTML" cannot be undone, so be sure you have saved your prepared text under a unique name before using it. (i.e., make a backup, so that the word "edited" does not appear in the Title Bar of the Guiguts window.)

The rest of the description of the HTML menu is mostly in the sequence in which its features will be used, rather than in the sequence in which they appear on the menu.


Convert the text to HTML (HTML Generator)

On the HTML menu, click "HTML Generator..." to access this dialog:

Gg-1.4.0-47a-html generator.png

Guiguts fills in the Title and Author fields, but you often will need to edit them, for example, to remove a leading /* and any honorifics in the Author field. In general, you will want to use the default options in the checkboxes and radio buttons.

  • CSS blockquote generates <div class="blockquot"> ... </div>, a class that is included in the distributed header.txt file. Turning it off generates a standard HTML <blockquote> ... </blockquote>
  • Short FN Anchors generates more efficient Footnote anchors
  • Skip coincident Pg#s prevents adjacent page numbers from being generated, and keeps only the last of them. This prevents blank (but numbered) pages from being displayed along with page numbers associated with text.
  • Convert Fractions converts the fractions 1/2, 1/4 and 3/4 into the HTML entities &frac12;, &frac14; and &frac34; respectively.

When you click "Autogenerate HTML", Guiguts will do just that; it can take several seconds, depending on the size and complexity of the text. Once generation is complete, save the text under a new name, with a filetype of ".html" or ".htm"

Items tagged as <i> for italics; <b> for bold; <g> for gesperrt; and <f> for font change, can be set to convert to one of four formats in the HTML version

Beginning with Guiguts 1.4.0, the generated code will [attempt to] be HTML5-compliant.

Now, you can begin the process of converting this document into a good-looking, readable book that closely resembles the original.


Header File

The header.txt file is distributed with guiguts and must be in the guiguts directory. The data inserted in the document begins with the HTML <head> section which defines, among other things, the type of HTML or XHTML to which the document aspires, and the character set it uses. It also includes the <title> element. You need to modify the title text to reflect the book.

The header file also defines all the CSS classes on which the generated HTML depends, for example the poem, stanza, and other classes referenced in the poetry generation. These classes strongly influence the appearance of the etext.

As you develop your own preferences for post-processing, you will want to modify the permanent content of header.txt in the guiguts folder. You'll also want to tailor each generated copy to make each project look like the original book. For example, you may need to add additional CSS for certain indents, or override what Autogenerate uses to indent poetry. Also, for efficiency, you should delete any unused classes, e.g., if your book has no sidenotes, you should remove the .sidenote class definition. If there are no footnotes, you can remove the classes related to footnotes; and so on.


Generated TOC

The generated chapter table of contents may or may not be useful. For example you may already have the original TOC with page numbers, protected by /$..$/. If so, just delete the generated TOC.

Find and fix HTML syntax errors

HTML Tidy

  • click "HTML Tidy" to find many kinds of errors:
Gg-1.4.0-47e-tidy.png
  • click on each item in this list to move to the error in the editing window; correct the error and continue through the Tidy list until you've resolved all of the errors.
  • click the "Run checks" button in the Tidy dialog to repeat the scan, correct any remaining errors, and repeat until Tidy says
Gg-1.4.0-47f-tidy2.png

HTML Validators

  • Guiguts supports two HTML Validators: Nu HTML Checker and Nu XHTML Checker.
  • Once Tidy has found no errors, use both HTML Validators for a more rigorous check for errors. Both are needed because they sometimes find different problems, as shown in the two real-world examples below:
Gg-1.4.0-47g-validatenuhtml.png
Gg-1.4.0-47g-validatenuxhtml.png
  • as with Tidy, click on each item in the list to find and correct the error. Repeat by clicking "Run checks" until there are no more errors:
Gg-1.4.0-47h-validate2.png
  • and then continue to follow the steps in whatever "How to Post-Process" workflow you are using.
  • use the Validators often (OFTEN!) throughout the preparation of the HTML version to re-check your work.


The HTML Markup Dialog

The HTML Markup dialog

Gg-1.3-47d-html markup.png

offers several kinds of functions, most of which add HTML markups to selected text:

  • The cluster of buttons at the top enclose whatever text you have selected (highlighted) in the main editing window within the tag or entity shown in that button, and the corresponding closing tag.
    • You can modify these buttons by right-clicking them to access a configuration dialog:
      Gg1.2-47d-html-markup-configure-attributes.png
      and typing in a class name or other valid attribute:
      Gg1.2-47d-html-markup-configure-attributes-example.png
      When a button has been modified, Guiguts will remember it until you change it again. To remind you there's been a modification, the button's caption will end with a plus sign: Gg1.2-47d-html-markup-configure-attributes-example-button.png You can remove the modification by right-clicking the button, deleting the extra text, and clicking "OK". When you do so, the plus sign will disappear from that button's caption.
  • The four rows of controls below those buttons offer similar, but more visible and more flexible capabilities:
    • The "div" button enclose the selected text in <div???>selected text</div>., where ??? is whatever text appears in the area just to the left of the button. Guiguts applies that text intelligently, adding "class=" if needed (third line of example), but not if it's already present (first line) or not applicable (second line).
    • The "span" and i buttons do the same thing, but with "span" or "i" instead of "div".
      • The four text-entry boxes are associated with a drop-down list that remembers your previous entries for quick retrieval. There's just one list, shared by all four lines.
  • The Links buttons help you create internal and external links:
    • Use the Anchor button to insert an anchor whose id is based on the current selection. For example, select the text CHAPTER 7 and click Anchor. The code <a id="CHAPTER_7" /> is inserted preceding the selection.
      • Guiguts deals properly with spaces (converting to underscores) and special characters when making this substitution. This gives you a quick way to insert an anchor for reference from elsewhere in the book.
    • Use the External Link button to create a link to another HTML file, as when you are breaking a large etext down into separate chapter files. A file-open dialog pops up and you browse to select the target file. If it is in the same directory, Guiguts builds a link using a relative pathname.
    • You can use the Internal Link button for two purposes: linking to an anchor in this file; and checking for duplicate anchors.
      • To link to an anchor (for example one created by the Anchor button, or a page number anchor created by automatic HTML generation) first select some text that will be the link. Click Internal Link and Guiguts will pop up a large window listing all named anchors in the file. It tries to put anchors with wording similar to the current selection at the top of the list. You can opt to exclude the numerous page-number and footnote anchors.
      • Double-click the target anchor; Guiguts encloses the selection in a link with href="#Anchorname".
      • To check for duplicate anchor-names, clear any current selection and click Internal Link. Guiguts builds its list of existing anchors and checks it for duplicates. It displays any duplicates in a warning message.
    • The Image button does almost the same thing as Auto Illus Search on the Generate dialog, but operates either on the text you've just selected, or inserts code at the current cursor position if no text is selected. You must therefore take care to select the correct text or position the cursor at the correct point in the text before clicking the Image button, or illustration information may be inserted in the wrong place.
  • Auto List creates an unordered or ordered list from the text you've selected. The list will begin with <ul> or <ol> and end with its closing counterpart. Each line within the list will begin with <li> and end with its counterpart. "ML" stands for "Multiple Lines." If every line in the selection represents just one list item, leave this unchecked. If some items use more than one line, check this box and leave a blank line between EVERY item, even the single-line ones.
  • Auto Table converts the current selection into an HTML table. When the ML (multi-line) switch is off, each line of the selection is marked as a table row. If ML is set on, each group of lines separated by a blank line is made into one table row (see example, below).
    • Columns are defined by two or more spaces OR a vertical bar | between elements. Use the Table Effects dialog (on the TXT menu) or space the columns manually to put two spaces or a vertical bar between column values; otherwise column values will be combined in a single cell.
      • If the text in the selection contains even one vertical bar, Guiguts will only look for vertical bars, not multiple spaces, as column separators. The advantage to this is that tables already formatted for Plain Text with vertical bars can be converted without first replacing the bars with multiple spaces. In the rare case of a vertical bar being part of the actual text, its value, &#124; can be used instead.
    • The alignment switches left, center, and right set the default alignment for all table columns. Guiguts inserts class="tdl" or class="tdc" or class="tdr" in each <td> markup.
    • Often, different columns of a table need different alignment; for example, a column of names should be left-aligned, one of numbers right-aligned. You can specify different alignments for each column by putting characters in the text field "Column Fmt" above the Auto Table button. You should place one character for each column in the table, using < for left-aligned, | (vertical bar) for centered, and > for right-aligned. For a four-column table to be aligned left, left, center, and right, you would enter <<|>. Extra characters are ignored; and columns for which there are no characters get the default alignment.
    • HTML has no concept of decimal alignment. When using proportional fonts, one (of several) ways to align numeric columns containing decimals and/or fractions is to pad the shorter numbers with hard spaces: two hard spaces usually have the same width as a digit, and one hard space usually has the same width as a decimal point.
  • The four buttons at the bottom:
    • Apply Poetry Markup to Sel. does just that: select (highlight) all the stanzas of the poem first; Auto Generate does that also;
    • Hyperlink Page Nums looks for 1-to-3 digit numbers and makes them into links to #Page_nnn. It uses the Search & Replace dialog to find candidates, and you can choose which ones to convert into links. When using this, be careful to not convert normal numbers into links;
    • Remove Markup from Selection does just that.
    • Find Some Orphaned Markup initiates a search for each possible type of unbalanced HTML markup. The search stops at the first error found; correct and click the button again to resume. This checks HTML markups only, unlike the DP-specific Find Orphaned DP Markup.


Multi-line Table Example

This text is from a Table of Contents. When using ML, it should look something like this, with at least 2 leading spaces on secondary lines of each entry, to show they belong in the second column, and at least 2 leading spaces before the page numbers, to show they belong in the third column. The Column fmt line will be set to ><> (right-aligned, left-aligned, right-aligned):

<p>
CHAPTER  @    PAGE<br />

@  <span class="smcap">Introduction</span>      v<br />

I  <span class="smcap">A Brief Account of the Tank, Its Crew<br />
    and Its Tactical Functions, As They<br />
    Were at the Date of the Armistice</span>      25<br />

II  <span class="smcap">The Earliest Tanks, General Swinton, Admiral<br />
    Bacon,—the Holt Tractor and the<br />
    Evolution of the “Land Cruiser”</span>      31<br />

III  <span class="smcap">The Tank Corps in Embryo</span>      46<br />

IV  <span class="smcap">The First Tank Battles—The Attack on<br />
    Morval, Flers, the Quadrilateral, Thiepval,<br />
    and Beaumont-Hamel</span>      57<br />
</p>

It should not look like this (everything left-justified):

<p>
CHAPTER  @    PAGE<br />

@  <span class="smcap">Introduction</span>      v<br />

I  <span class="smcap">A Brief Account of the Tank, Its Crew<br />
and Its Tactical Functions, As They<br />
Were at the Date of the Armistice</span>      25<br />

II  <span class="smcap">The Earliest Tanks, General Swinton, Admiral<br />
Bacon,—the Holt Tractor and the<br />
Evolution of the “Land Cruiser”</span>      31<br />

III  <span class="smcap">The Tank Corps in Embryo</span>      46<br />

IV  <span class="smcap">The First Tank Battles—The Attack on<br />
Morval, Flers, the Quadrilateral, Thiepval,<br />
and Beaumont-Hamel</span>      57<br />
</p>

because whatever is at the left margin is assumed to belong in the first column of the table.

The @ signs are placeholders for empty cells, and you will need to remove them after the table has been generated.

Add Illustrations

Autogenerate HTML converted [Illustration: caption] to <p>[Illustration: caption]</p>. With your help, Guiguts can convert those to HTML that will display the appropriate illustrations and let you fine-tune the formatting of the captions.

  • Begin by clicking "HTML Generator" on the HTML menu:
Gg-1.4.0-47a-html generator blank.png
and then clicking "Auto Illus Search".

Auto Illus Search

  • The first time you do this in each project, Guiguts will ask you where the images are, by opening a standard File dialog:
Gg-1.3-47i-images folder.png
  • Select the "images" folder to see its contents; choosing a view that shows thumbnails of the images, rather than just their names, will make the selection process easier:
Gg-1.3-47b-auto illus search.png
  • Select the image that matches the highlighted text in the main editing window, and Guiguts will show you this dialog:
    Gg-1.3-47c-image selection.png
    • The "px" option under "Geometry" will be available only if you've selected the option for it on the Preferences>Processing sub-menu.
    • If the image had a caption, it will appear in the Caption text line. If not, or if you want to include additional information describing the image, fill in the Alt text line.
    • Alt text will be displayed by browsers when they cannot display the images (usually because of a technical problem, sometimes because people have turned off image display).
    • Title text, if you supply it, will be placed in the Title attribute of the <image> tag.
    • Geometry defaults to scaling the image by percentage, filling in a suggested maximum width, and showing you the basis for the calculation. You can change this percentage to suit your project or preferences. This works well with [very] large original images, but if you have a small image, such as a logo or something that should occupy only a small part of the screen, click "em" to tell Image Geometry to use the actual width and height.
      • You can elect to express widths in pixels ("px") if that option is shown. By default, it is not, but you can use the Preferences>Processing menu to enable it.
      • Override % with 100% in epub: Since handheld devices are typically smaller, this option overrides the percentage width specified above with 100% width on those devices. Uncheck this option if you wish your chosen percentage to also be used on handheld devices.
      • Image size will be limited to its natural (actual) size, to avoid images looking fuzzy or blocky when they are substantially smaller than the window width. (That usually is noticeable only on large monitors, or for small images) Some Kindle viewers and devices do not support this feature.
      • Details of how the scaling percentage is calculated: This value is the maximum percentage width you could use if you want your illo to fit in both width and height on a landscape screen with aspect ratio 4:3, e.g. 1024x768 pixels. With a bigger percentage, some of the illo would drop off the bottom of the screen (or off the side if you chose > 100% width). When you set it to the "suggested" percentage, it will either be full width or full height on a 4:3 (landscape) screen.
    • The default alignment is "Center". If you want the image to be floated to the left or right instead, select one of them.
    • after making all the choices for this image, you can:
      • click OK to insert the generated HTML for this image and return to the HTML Generation menu, or
      • click Insert & Load Next to insert the generated HTML and automatically go to the next [Illustration] tag and next image file (this can be much faster than using OK, especially when there are a lot of images to insert), or
      • click Cancel to exit without inserting anything.
    • If you choose OK or Insert & Load Next, the generated HTML will look something like this if em is the chosen unit of measurement:
<div class="figcenter illowp75" id="i_p395" >
  <img src="images/i_p395.png"  alt="" />
  <div class="caption"><p>THE STANDARD STEEL WORKS.<br />
TIRES STANDARD TIRES<br />
PHILADELPHIA<br />
</p></div>
</div>
  • NOTES
    • CSS also will be generated (at the end of the <style> section) to define the "illowp' class:
.illowp75 {width: 75%;}
    • If percent is chosen as the unit of measurement, the <div> also will contain style="max-width: nnem;".
    • if px is chosen as the unit of measurement, the <div> will contain style="max-width: nnpx;" but no classname, because no CSS will be generated.
    • if you choose OK, The "HTML Generator" dialog will remain on the screen, so you can click "Auto Illus Search" again, to convert the next image. Repeat the process until all occurrences of <p>[Illustration]</p> have been converted, then close the "HTML Generator" dialog.
    • Insert & Load Next selects the next image in the images folder in alphabetic (collating) sequence, so it only works when the filenames are in the same sequence as the [Illustration] tags they are supposed to match.
    • Most of the time, the captions will need further work, so use Search & Replace to look for "<img" (or something else that is unique to all image markups) throughout the document.
    • The Prev File and Next File buttons let you quickly select other images if the current one doesn't match the [Illustration] tag. This is useful when there are illustrated drop-caps or multiple versions of some images.

Check for image-related errors (PPVimage)

On the HTML Menu, click "PPVimage". If the document is large and/or there are a lot of illustrations, this can take a while. It will list the images it's found, along with any errors (dimensions don't match the image's actual dimensions) and/or omissions (missing or unused image files). Click on them to find the errors in the editing window, correct them, and click "Run checks" again, until there are no more errors.


Check for link errors (HTML Link Checker)

Use this to find broken and duplicate links. The report list works like the others described above.

Find and fix CSS errors

Guiguts has two tools for this, because there are two primary kinds of CSS-related errors: invalid syntax and valid but undefined/unused CSS classes.

CSS Validator

This runs a local copy of the official CSS Validator provided by the World Wide Web Consortium (W3C). It looks for violations of the CSS version selected on the Preferences>Processing sub-menu. CSS3 is the default on that menu, but the CSS 2.1 standard is, for the most part, what Project Gutenberg (PG) requires. PG does accept some CSS3, notably {color: transparent;}, but when set to CSS 2.1, the validator will flag that as an error, even though you are permitted to use it.


PPhtml

Another kind of CSS-related error occurs when an undefined class is used in the body of the document. PPhtml will find those, but also may flag some defined classes as being undefined. It also finds defined but unused CSS classes, but again, some of them actually may be used. Because PPhtml is over-cautious, you should check each possible error before making any changes. The line numbers of the possibly unused classes appear to the left of the messages, and clicking a message will take you to that line, while right-clicking it will remove the message from the report. In most cases, there still will be items in the list after you've made corrections and re-run this checker.

Gg-1.3.2-47j-pphtml.png

Convert to and from Entities (mostly archaic)

These three conversions are available for use with text that contains certain characters, such as the hard space, ampersand, curly quotation marks, and some fractions, that you may want or need to handle in a special way. To use any of them, select some text (anything from one character to the entire document) and click the conversion you want done. Review the results before saving the file: "Convert to Entities" and "Convert from Entities" usually can reverse each other, but "Convert Fractions to Entities" does not necessarily have a reversal counterpart.

These convertors were more important when Guiguts was developed than they are today, as DP and Project Gutenber's standard character set was quite limited; today, we can use the entire UTF-8 code space.


Create an Index using HTML Auto Index (List)

Although this option appears near the end of this HTML topic, it's one of the first things you will use when preparing the HTML version of a book that has an Index. The menu option does the same thing as the /i ... i/ block markup described earlier, but is more awkward to use. If you still want to use it (before running Autogenerate HTML):

  • replace the /* above the index with a blank line and /x
  • replace the */ below the index with a x/ and a blank line
  • select the entire index, but not the four replacement lines you just made
  • click HTML Auto index (List)
  • click Autogenerate HTML
  • find the index, remove the <pre> at the beginning and the </pre> at the end, and you will have the same thing that just using /i ... i/ will give you

Either way, this option will save you hours to days of work. The generated Index will be an unordered list, with everything that looks like a page number made into a link. Those links only will work if you've used "Configure page labels" before this, preferably as the first or second step when you began the project. Page ranges may not be tagged properly, Roman numeral page references will not be tagged at all, and 4-digit dates may be split into a "1" followed by a link to a 3-digit page number (you will have to find and fix those).


See how a book will look on a handheld device (EBookMaker)

EBookMaker uses the currently-saved HTML file and the images folder to generate a .mobi (Kindle) file and an .epub (everything except Kindle) file, both of which are widely used on handheld devices. Use this after you've finished making your HTML look as good as possible in Browsers, then examine the .mobi and .epub files in computer previewers (such as Kindle Previewer, Adobe Digital Editions, and Calibre) and/or actual handheld devices. Try to adjust your CSS and associated HTML to make those handheld versions look as readable as you can, without adversely affecting the appearance of the HTML in Browsers.

  • Guiguts is packaged with the most recent version of eBookMaker that was available at the time of release. This may not be the latest version when you are using it at some later date, but updates are unlikely to matter for testing purposes. The online version at Project Gutenberg (see below) always is the latest version;
  • Guiguts passes the book's Title and Author (if it can find them in the <title> of the document header) to its own copy of eBookMaker, and eBookMaker can provide that information to eReaders for use in page headings;
  • while eBookMaker is running, Guiguts will wait for it to finish, just as it does with most of its other tools. When eBookmaker finishes, the Wait cursor will return to a normal text cursor. If errors were detected, Guiguts will display them;
  • For information about EBookMaker's ability to create .mobi files for Kindles, see the Preferences Menu->File Paths dialog.
  • for further information about eBookMaker, please see the eBookMaker Wiki page.

The Custom menu has an option to use Project Gutenberg's (PG) online eBookMaker, and that is the one PG will use to actually produce the .mobi and .epub files that people will be downloading once your book becomes available at PG. Guiguts does not wait for the online version to finish and does not look for possible processing errors.