PPTools/Guiguts/HTML

From DPWiki
Jump to navigation Jump to search

HTML Palette

Access the Guiguts tools for generating HTML with Fixup> HTML Fixup, or click Guiguts Tb-html.png in the toolbar, opening the following palette:

Guiguts HT main.png

This palette controls both automatic (whole-document) conversion to HTML, and specific markup of individual elements. Automatic conversion is covered first; then element-by-element markup.

Autogenerating HTML

The top few items on the HTML palette control bulk conversion of the entire document to HTML. The most important is Autogenerate HTML; when you click it, Guiguts makes the following changes, some of which are detailed in the following paragraphs:

  • Inserts the file header.txt from the Guiguts directory at the head of the document (detail).
  • Encloses the first line of text (other than an [Illustration]) in <h1>..</h1>.
  • Encloses chapter titles (identified by 4 blank lines) in <h2>..</h2>.
  • Inserts a named anchor ahead of each chapter title.
  • Inserts <hr style="width:65%;" /> ahead of each chapter title.
  • Generates a table of contents containing links to all chapter titles and inserts it at the top of the body (detail).
  • Inserts a visible page number at every page boundary (see below for how to hide them). Optionally, generates a comment and/or an anchor at each page boundary (detail).
  • Encloses any text that looks like a body paragraph in <p>..</p>.
  • Optionally converts Latin-1, Greek and other Unicode characters to HTML entity notation (detail).
  • Optionally converts fractions such as 1/2, 1/4, etc., to the HTML entities for single-character fractions such as &frac14; (¼).
  • Converts both <tb> markups and 5-star thought breaks into <hr style="width:45%">
  • Processes /P..P/ sections as poetry (detail).
  • Processes /#..#/ sections as block quotes (detail).
  • Processes /*..*/, /f..f/, /$..$/, and /X..X/ sections as tabular lines (detail).
  • Formats footnote anchors as links to the corresponding footnotes, and encloses footnotes in <div class="footnote"> blocks.
  • Converts <sc>..</sc> to <span class="smcap">..</span>

Caution: these changes are not undo-able. Save the file before starting this process. Afterward, save it under a new name.

Guiguts tries to leave existing HTML alone during automatic generation, but it does not really parse existing HTML, so it is easily confused by multi-line markup like <table> or even an <img.../> if it is pretty-printed across multiple lines. Automatic HTML inserted into such statements can make a mess. For best results, do automatic generation first and do element-by-element markup after.

Header File

The header.txt file is distributed with guiguts and must be in the guiguts directory. The data inserted in the document begins with the HTML <head> section which defines, among other things, the type of HTML or XHTML to which the document aspires, and the character set it uses. It also includes the <title> element. You need to modify the title text to reflect the book.

The header file also defines all the CSS classes on which the generated HTML depends, for example the poem, stanza, and other classes referenced in the poetry generation. These classes strongly influence the appearance of the etext.

You are encouraged to modify the inserted header, for example to change the document type or to change CSS stylings. If you do not want the page numbers to be visible in the right margin, you remove the comment markers on the line

/*  visibility: hidden;  */

You should modify the header after it is in the document to delete any unused classes. For example if your book has no sidenotes, you should remove the .sidenote class definition. If there are no footnotes, you can remove the four classes related to footnotes; and so on.

Generated TOC

The generated chapter table of contents may or may not be useful. For example you may already have the original TOC with page numbers, protected by /$..$/. If so, just delete the generated TOC.

Page Boundary Anchors

The two switches Pg #s as comments and Insert Anchors at Pg #s determine what HTML is produced at each page boundary. Both switches are on by default. Inserting anchors is recommended, especially if you have set up page labels beforehand. (If you haven't, you can open the Configure Page Labels dialog by clicking the Custom Page Numbers button at the top of the HTML palette.) The etext will then contain an anchor of the form <a name="PG_n" id="PG_n"></a> at each page boundary, where n is the folio number. You can use these anchors to quickly hyperlink page references throughout the book to the proper places (see #Hyperlinking_Page_Numbers below).

Character Conversion

By default Guiguts converts all non-ASCII characters into HTML numeric entities, for example a with a circle over becomes &aring; (å). The result is that no matter how many or how exotic the characters in the text, the resulting HTML file is pure ASCII.

If you are working on a text that contains a very high proportion of such characters, you might prefer to leave the Latin-1 or possibly even the UTF8 characters in the file. In that case set the Keep UTF-8 Chars switch before conversion. Also, change the character encoding property in the header text to reflect the actual document encoding.

Block Quote Markup

Guiguts processes block quote (/#..#/) sections in one of two ways. If the CSS Blockquote switch is set on, it marks block quotes as <div class="blockquot">..</div>. If the switch is set off, it marks them as <blockquote>..</blockquote>. This choice was added following a debate in the Post-processing forum as to whether or not the old <blockquote> markup was valid in the new world of XHTML. (Current feeling seems to be that it is still supported when used for its intended purpose, setting off a quote.)

Tabular Markup

Guiguts marks up tabular material in an earnest attempt to make it look something like the original.

Sections marked as /*..*/ or /$..$/ receive the same HTML (the only difference between them is that at Rewrap time, /$..$/ sections are not automatically indented). For either, Guiguts encloses the whole section as a single <p>..</p>, and places <br /> at the end of every line including empty lines. Lines that are indented to any depth are enclosed in <span style="...">..</span><br /> where the style sets a left margin to approximate the text line's indent. In both types of section, Guiguts replaces runs of spaces with runs of &nbsp; to try to preserve columnar alignment.

For simple tables and lists, this markup is often good enough. For complex tables with columnar alignment, it often isn't; and you need to manually convert the table into an HTML table.

Front-matter Markup

In a section marked /f..f/ (front matter), Guiguts encloses the section as a single paragraph <p class="center">..</p>. It does not insert breaks at the end of each line. The browser displays the section as wrapped text with blank lines closed-up.

Unordered List Markup

In a section marked /L..L/, Guiguts encloses the whole section in <ul>..</ul>. It encloses each non-empty line in <li>..</li>. The section is thus converted into an unordered (bulleted) list with each line as a list item.

Columnar-text Markup

For a section marked /X...X/, Guiguts only encloses the whole section in <pre>...</pre> markers. No other HTML is inserted. The <pre>...</pre> markup causes the browser to display the text with its existing line-breaks in a monospaced font, typically Courier.

Use this markup to protect sections where you plan to use element-by-element markup to achieve a certain look. For example, if you plan to format a table using <table>, or want to convert a TOC or index into an unsigned list, the HTML inserted by Guiguts will only get in your way. Before you apply automatic HTML, change the /*..*/ or /$..$/ flags on such tables to /X..X/.

Inserting HTML Element by Element

Many of the buttons in the HTML palette allow you to quickly insert HTML code. You select the text you want to mark up, then click one of the buttons to insert the markup. The following table lists the buttons from left to right, top to bottom. Some are discussed in more detail below. Each of these operations can be undone.

Available Markup

<i> (Italics) Encloses the current selection in <i>...</i>.
<b> (Bold) Encloses the current selection in <b>...</b>.
<u> (underline) Encloses the current selection in <u>...</u>.
<center> (center) Encloses the current selection in <center>...</center>.
<hn> (heading level n) Encloses the current selection in <hn>...</hn>.
<p> (paragraph) Encloses the current selection in <p>...</p>.
<hr> (horizontal rule) Inserts <hr style="width:95%;" /> at the insertion point.
<br> (line break) Inserts <br /> at the insertion point.
nb space (non breaking space) Inserts &nbsp; at the insertion point.
Poetry Marks up the current selection as a poem (detail).
<big> Encloses the current selection in <big>...</big>.
<small> Encloses the current selection in <small>...</small>.
<ol> (ordered list) Encloses the current selection in <ol>...</ol>.
<ul> (unordered list) Encloses the current selection in <ul>...</ul>.
<li> (list item) Encloses the current selection in <li>...</li>.
<sup> (superscript) Encloses the current selection in <sup>...</sup>.
<sub> (subscript) Encloses the current selection in <sub>...</sub>.
<table> Encloses the current selection in <table>...</table>.
<tr> (table row) Encloses the current selection in <tr>...</tr>.
<td> (table data cell) Encloses the current selection in <td>...</td>.
<blockquote> Encloses the current selection in <blockquote> ...</blockquote>.
<code> (code, display in monospace font) Encloses the current selection in <code>...</code>.
Named Anchor Inserts an anchor based on the selection (detail).
Image Inserts the HTML to include an image (detail).
External Link Encloses the current selection in a link to another file (detail).
Internal Link Encloses the current selection in a link to an anchor defined in this document; also used to check for duplicate anchors (detail).
Remove markup from selection Strips HTML markup (except for <i> and <b>) from the current selection. Not an Undo, doesn't restore formatting.
Find orphaned markup Initiates a search for each possible type of unbalanced HTML markup. The search stops at the first error found; correct and click the button again to resume. HTML markup only, unlike the general orphaned markup search.
Auto List Make the selection into an ordered or unordered list (detail).
AutoTable Make the selection into a table (detail).
div
span
Enclose the current selection in a div or span with specific styling. Enter any attributes to follow <div or <span, e.g. class="name".
Header Insert the file header.txt from the guiguts folder at the top of the document, and insert the </body> and </html> lines at the end of the document.
Link Checker Check and summarize all links (detail).
HTML Tidy Pass the document through the tidy program (detail).

Poetry Markup

Clicking the Poetry button marks up the current selection with one form of HTML poetry markup:

  • The whole selection is enclosed in
<div class="poem">...</div>
  • Each stanza (delimited by a blank line) is enclosed in
<div class="stanza">...</div>
  • Each line is enclosed in
<span class="in">...<br /></span>
where n is the proper indent, from 0 to 9 ems.

The use of <span>...<br /></span> on each line is intended to make poetry display properly in text-based browsers such as Lynx. (Current thinking in the DP forums is that the shorter <div>...</div> markup would be as good.)

It is the Guiguts convention that poetry is always rewrapped to be indented by four spaces. This will be the case if the starting file has been rewrapped before you begin HTML conversion. Therefore, lines that are indented by just four spaces are styled class="in0", and the class number increases by 1 for each two text spaces of indention. Provided that the proofers and you have been careful and consistent about indenting the lines, the result will look correct in a browser.

Inserting Anchors and Links

Use the Named Anchor button to insert an anchor whose id is based on the current selection. For example, select the text CHAPTER 7 and click Named Anchor. The code <a name="CHAPTER_7" id="CHAPTER_7" /> is inserted preceding the selection.

Guiguts deals properly with spaces (converting to underscores) and special characters when making this substitution. This gives you a quick way to insert an anchor for reference from elsewhere in the book.

Use the External Link button to create a link to another HTML file, as when you are breaking a large etext down into separate chapter files. A file-open dialog pops up and you browse to select the target file. If it is in the same directory, Guiguts builds a link using a relative pathname.

You can use the Internal Link button for two purposes: linking to an anchor in this file; and checking for duplicate anchors.

To link to an anchor (for example one created by the Named Anchor button, or a page number anchor created by automatic HTML generation) first select some text that will be the link. Click Internal Link; Guiguts pops up a large window listing all named anchors in the file. It tries to put anchors with wording similar to the current selection at the top of the list. You can opt to exclude the numerous page-number and footnote anchors. Double-click the target anchor; Guiguts encloses the selection in a link with href="#Anchorname".

To check for duplicate anchor-names, clear any current selection and click Internal Link. Guiguts builds its list of existing anchors and checks it for duplicates. It displays any duplicates in a warning message.

Note that Anchors for links to illustrations should be placed outside the caption span in order to work with IE6. <d iv ><a ></a ><img />< s p an ></ s pan ></d iv > works, but <d iv ><img />< s pan ><a ></a ></ s pan ></d iv > does not work and you wont get warnings.

Inserting Image Code

The Auto Illus Search button causes Guiguts to search for the first [Illustration markup and highlight it in search orange. Alternatively you can select an [Illustration line yourself and click Image.

Guiguts HT illo.png

In either case, a file-open dialog pops up, and you use it to browse to the image file for this illustration, for example images/image01.jpg. Guiguts shows a dialog in which, from bottom to top, you see a thumbnail of the image, a choice of alignment buttons, and the dimensions of the image. The Alt Text field is filled with the text from the [Illustration markup. (Caution: the full text from the markup is taken; if it was long and wrapped to multiple lines, the line-breaks are included and you need to manually edit them out.)

Normally you leave the dimensions as-is, but if you want the browser to compress or stretch the image, you can enter different dimensions. You can change just one of the dimensions and set on the Maintain AR (aspect ratio) button, and Guiguts will adjust the other dimension in proportion.

You can set text for the title attribute; usually, you just copy the Alt text and paste it into the Title field.

When you click OK, Guiguts replaces the [Illustration] line with the following HTML:

<div class="figcenter/left/right" style="width: widthpx;">
<img src="path-to-image" width="width" height="height"
   alt="Alt text" title="Title text" />
<span class="caption">Alt text.</span>
</div>

The path-to-image is a relative path when the image is in the same folder as the document, or a subfolder. Typically images are located in the subfolder images and the generated code has src="images/imagenn.jpg".

Using Auto List

When you click Auto List, Guiguts converts the current selection into an unordered or ordered list, depending on which switch is set. The <ul>..</ul> or <ol>..</ol> tags are placed at the beginning and end. The lines of the selection are marked up as list elements with <li>..</li>.

When the ML (multi-line) switch is off, each line of the selection is marked as a list item. If ML is set on, list items are made from groups of lines separated by blank lines.

Using Auto Table

When you click Auto Table, Guiguts converts the current selection into an HTML table. When the ML (multi-line) switch is off, each line of the selection is marked as a table row. If ML is set on, each group of lines separated by a blank line is made into one table row.

Just as with ASCII Table Effects, columns are defined by two or more spaces between elements. Use the Table Effects palette or space the columns manually to put two spaces between column values; otherwise column values will be combined in a single cell.

The alignment switches left, center, and right set the default alignment for table columns. Guiguts inserts align='left' (or right or center) in each <td> markup.

Often, different columns of a table need different alignment; for example, a column of names should be left-aligned, one of numbers right-aligned. You can specify different alignments for each column by putting characters in the text field "Column Fmt" below the Auto Table button. You should place one character for each column in the table, using < for left-aligned, | (vertical bar) for centered, and > for right-aligned. For a three-column table to be aligned left, left, and right, you would enter <<>. Extra characters are ignored; and columns for which there are no characters get the default alignment.

Using the Check Tools

The HTML palette has a number of tools to check the correctness of the HTML: Link Check, HTML Tidy, W3C Validate and W3C Validate, PPHTML, and Image Check. The Check All buttons runs of these checks at once.

HTML Tidy is a free program that parses HTML for errors, and which can reformat HTML in various ways. To directly download the file click here (or here, if you're reading this sometime after April 2008 and want to check for more recent versions). When you have installed Tidy as an executable in your system and directed Guiguts to it during Setup, you can invoke Tidy by clicking the HTML Tidy button in the HTML palette. Guiguts saves the document and runs it through Tidy. It collects Tidy's output and displays it in a report window. Tidy can created a tidied version of your HTML file; do not use this.

The Link Checker button at the bottom of the HTML palette invokes a thorough check of all HTML links in the document. It finds all named anchors, internal links, external links, and image links. It opens a report window that lists:

  • The totals of anchors, links, and image links.
  • Internal links without anchors. These are certainly errors, usually from misspelling the anchor name.
  • External links of any kind. These should never appear unless you have split a book into separate chapters.
  • Links containing spaces or special characters. These are usually errors.
  • Image links that use capital letters (incorrect by PG standards).

The link check also checks that all files in the images directory are named in a link, and lists the ones that are not. This may give false error reports if images are not all contained in a single directory.


Hyperlinking Page Numbers

Once Guiguts has generated page-number anchors that match the book's folios, you can rather quickly convert a cross-reference, an index, a table of contents, a list of illustrations—anything that contains a page-number reference—to a hyperlink. This greatly increases the value of the etext to the reader.

The primary tool in this is the "Hyperlink Page Numbers" button. This uses a regular-expression search-and-replace. To do this manually, set up a search for:

(?<!\d)(\d{1,3})

That is, look for a string of 1 to 3 digits preceded by a non-digit; the digits are marked for quoting in the replacement. Set the replacement to:

<a href="#Page_$1">$1</a>

That is, the found number is formatted as a link to the page anchor for that number.

Now you are set to walk through all page-number candidates in the document, or in a selection. For example, set the insertion point at the top of chapter 1 and click on the title bar of the Search dialog to give it the keyboard focus. Press the Enter key, which is shorthand for Search. The first string of digits is found and displayed. If it does not represent a page number reference, just press Enter again to find the next one. When you find a number that does represent a page number, as in "(see pg. 192)," type Control-Enter, the shorthand for Replace and Search Again. In this way you can stroll through the book turning page references into hyperlinks.

Obtaining/Installing Guiguts

It is assumed that you have already obtained and installed Guiguts; but if you have not, please see the following link: