User:Hutcheson/Proofing and Formatting Guidelines

From DPWiki
Jump to: navigation, search

Applicability

The official DP guidelines, DP_Official_Documentation:Proofreading/Proofreading_Guidelines and DP_Official_Documentation:Formatting/Formatting_Guidelines apply to all projects, unless otherwise noted in the project comments. New editors should read and understand those first.

These 'hutcheson' guidelines are not complete. They build on the standard DP guidelines. Sometimes these add instructions for special cases. Sometimes they contradict the standard DP guidelines. For any case which these guidelines do not cover, use the standard DP guidelines.

These guidelines ONLY apply to books that I manage and postprocess, or where they are specially requested in the project notes. (Any postprocessor is welcome to ask that they be used on their books.) But please do not use them on other postprocessors' books without the project manager's specific authorization.

These special proofing and formatting guidelines have often saved me significant proofing and formatting time, and provided a better eText with less postprocessing effort. They may always be applied to any book with my name as postprocessor. (If I do not explicitly mention them in project notes, feel free to ask me whether they apply. If I am postprocessor, the answer will always be "yes, I should have explicitly mentioned it.")

For my perspective on what I'm trying to do, see Guidelines Background.

Motivation

How DP CAN'T Help (And Shouldn't Try!)

  • Formatters cannot align ANYTHING, vertically or horizontally: whitespace is not preserved in postprocessing.
  • Formatters cannot draw horizontal or vertical lines (for tables, genealogies, or charts).
  • Formatters cannot correctly rewrap lines on ANYTHING because, well, variable-width screens and variable-size fonts.
  • Formatters cannot use DP markup to indicate font changes, table layout, centered or right-justified text, because, well, there isn't any such markup.

How DP CAN Help

  • Proofers and Formatters can find and mark internal links.
  • Formatters can unwrap specially-formatted text where most (but not all) line breaks are significant
  • Formatters can arrange tables so that markup is easy to add.
  • Formatters can use horizontal and vertical space to give information to the postprocessor
  • Formatters can mark situations where a large number of irregularly-distributed text snippets need to be treated consistently.
  • Anyone can give feedback both about what I am asking for, and how I could be asking more clearly.

Feedback

  • If these guidelines can be made more complete, clear, or concise...
  • If procedure or format changes would make these rules easier to apply...
  • If you find this hard, or easy, to do...
  • If you think this approach could profitably be extended to other complex formatting features...
  • If use these guidelines, or any part of them, on projects that you are postprocessing...
  • If changes in these guidelines might simplify other projects that you are postprocessing...
    • ... please let me know!

Guidelines

Proofing Unusual UTF characters

Note that I still support the old DP two-characters-in-brackets conventions, and the old Greek transliteration scheme, which have been superseded in DP's new UTF-based editor. There is no need to remove these from books which have been in the rounds a long time; but there is never any harm in using the new character pickers, even for older books. Do whatever is most convenient for you.

Fractions (deviation from standard guideline)

UTF provides single-character proper fractions all lowest-term fractions from halves to eighths EXCEPT sevenths. That is, 3/4 (but not 2/4), 4/5, and 5/6 (but not 4/6).

There are two ways to express these single-character fractions:

  1. Use the character picker, if the fraction is available there.
  2. place the fraction in brackets: [1/3] [2/3] [1/5] [2/5] [3/5] [4/5] [1/6] [5/6] [1/8] [3/8] [5/8] [7/8].

[1/2] [1/4] [3/4] work either way; as DP support is extended, other fractions may also work either way.

Do not add a dash before these when they are compound fractions. For example: one and a half should look like "1½" or "1[1/2]".

For other fractions. UTF has superscript and subscript digits which (with slash) can be used to build fractions. Please use a "[fr" code for these: for example, mark 10/23 as [fr 10 23] (yes, replace the slash with a space). Do not add a dash before these when they are compound fractions. For example: one and a tenth should look like "1[fr 1 10]".

Odd Symbols (addition to DP guideline)

Inches/Feet/Minutes/Seconds: UTF supports both characters; when DP begins to support them, they should be used to match the scan. TRANSITIONAL RULE: All projects will have these characters on the project-specific character picker. Do not use accents, single or double quotes.

If you know the name of an unusual symbol, give its name, all lower case (unless there's a good reason why not), in brackets with a single asterisk.

My tools can already recognize, or upon demand will be made to recognize:

  • Greek Letters [*alpha], [*beta], ... [*zeta], including oddments like [*qoppa], [*digamma], and [*sigmaf] (sigma, final form). Capitalize for upper case [*Alpha], etc. (Most of these are also available on the Greek character picker now.)
  • Hebrew letters [*aleph], [*beth], ...
  • Anglo-Saxon letters [*eth], [*Thorn], honoring capitalization as usual
  • Astrological symbols [*aries],... [*mercury],... etc.
  • Mathematical symbols [*sqrt], [*and], [*or], [*union], ...
  • Arrows [*rArr], [*lArr], [*rarr], [*rArr]: left or right arrow (if capitalized, double arrow)
  • Any standard HTML entity name

Adding additional symbols is a matter of adding entries to a couple of tables to show how a symbol should be displayed in UTF, HTML, and Latin-1 (only UTF and HTML versions are published).

EXTENDING THE CONCEPT: If you see a character that's not in one of the classes above, just make up a symbol name, mention it in the book's discussion thread, and keep going. Pick something short, unique, and intuitive. If I don't like it, I'll global edit in postprocessing. If I do like it, I'll add it to the list above.

Proofing/Formatting Internal Links (addition to DP guidelines)

DEFINITIONS:

  • Internal links: include any reference within a book to other parts of the book, or to parts of a closely related book.
  • Simple links: consist of a single word ("keyword") specifying what kind of thing is being referenced, followed by an identifier, usually a word or number, indicating which thing it is.
  • Complex links: do not have a distinct keyword, or have a single keyword that specifies several links. Bible references are simple if the book is specified exactly once for each chapter or verse combination.

MARKUP GUIDELINES:

Enclose all links in [] brackets. For simple links, include both the keyword and the number within the brackets. For complex links include only the number within the context, but add a letter indicating what kind of link: f for figure, p for page, c for chapter, g for glossary, x for anything else:

For complex Bible references, include only the chapter-verse combination within the brackets; create a keyword by prefixing the capitalized book name with a lower-case "i": iGen. , iExod. , ... iMatt. , iMark , ... iRev.

DO NOT REMOVE existing punctuation or special characters (e.g., parentheses). All you are doing is ADDING brackets. Make sure that the added brackets are included inside any other punctuation.

Examples of Simple Links
Proofed Text With Internal Reference Text With Marked Links
Page 17 (or p. 17) [Page 17] (or [p. 17])
See figure 12 or (Fig. 12) See [figure 12] or ([Fig. 12])
Chapter 10 [Chapter 10]
(ANC 112) ([ANC 112]) (a reference to a page in a frequently referenced book)
Matt. 17:5 or Ps. 121 [Matt. 17:5] or [Ps. 121] (bible references)
Examples of Complex Links
Proofed Text With Internal Reference Text With Marked Links
pp. 12, 13 pp. [p 12], [p 13] (two separate links)
pages 12-15 pages [p 12]-15 (should link to page 12, but the string doesn't exactly match)
see the definition of Anticline see the definition of [g Anticline] (reference to an in-book glossary or index)
chapter 12 of Hebrews chapter [iHebrews 12] of Hebrews (bible reference)
covered in more detail below. covered in more detail [x below].
figures IV through VI figures [f IV] through [f VI]
Chapters 4 and 5 Chapters [c 4] and [c 5]

EXCEPTION: INDEXES AND TABLES OF CONTENTS: Do NOT mark page number links in tables of contents, lists of illustrations, indexes, or species keys--that is, cases where a lot of links are present in a very regular form: they can be handled almost (or fully) automatically in postprocessing.

EXTENDING THE CONCEPT: The "macro" [name parameters] syntax, not used much by DP, is useful for a lot of things. Wherever in a project you see lots of little bits of text that need the same special bit of HTML treatment, it's time for a new macro. It's easy to define macros for each project: a computer manual might have a special "kw" macro for words that should be boldfaced-monospaced-sans-serif font because they are keywords in the computer language. A math text might have a special "fr" (fraction) macro for proper fractions printed vertically with a separating line.

SPECIAL CASES: In a situation that you think is rare (so a general rule isn't likely to be needed), and you aren't sure whether it's a link or not, there's no need to ask in the forums, simply mark the link with an x and a question mark: is [x? THIS] a link? Postprocessor will either tweak it to be correct, or decide it's not a link and delete it.

COMMENTARY: Based on feedback from proofers, I'm now requesting that links be marked in the proofing passes. Formatters are asked to be alert to the possibility that links may have been overlooked earlier; please add missing links.

Formatting Headings (change to DP guidelines)

This section is significantly different from the standard DP guidelines, and is especially important.

  • Single-space multi-line headings.
  • Dewrap heading lines, no matter how long the lines are.
  • Treat a level-n heading followed by a level-n+1 heading as if it were two consecutive level-n headings.
  • Remember that nothing you do directly affects the format of a text file; you are merely indicating which lines will receive HTML heading markup.

Examples

  • As Proofed:
    • ...
    • _
    • Chapter IV
    • _
    • In which the protagonists face challenges from the antagonists
    • and from the vicissitudes of ironic fate
    • _
    • "We're safe now," said....
  • As Formatted:
    • ...
    • _
    • _
    • _
    • _
    • Chapter IV
    • In which the protagonists face challenges from the antagonists and from the vicissitudes of ironic fate
    • _
    • _
    • "We're safe now," said....

In contrast, standard DP rules would have left a blank line after "Chapter IV" and would not have unwrapped the long chapter title line.

  • As Proofed:
    • ...end of chapter.
    • _
    • Chapter IV.
    • _
    • The Sixteenth Century.
    • _
    • Section 1.
    • _
    • Germany.
    • _
    • Subsection 1.1. Stylistic Trends.
    • _
    • Basketweaving fell out of favor...
  • As Formatted
    • ...end of chapter.
    • _
    • _
    • _
    • _
    • Chapter IV.
    • The Sixteenth Century.
    • _
    • _
    • _
    • _
    • Section 1.
    • Germany.
    • _
    • _
    • Subsection 1.1. Stylistic Trends.
    • _
    • Basketweaving fell out of favor...

In contrast, DP rules only provide for two levels of formatting, and cannot handle a chapter beginning with a section heading; also, the postprocessor generally must make all distinctions between sections, subsections, subsubsections, etc.

Inline Headings

Many of my projects contain bolded/italicized/smallcaps phrases that act like sub-sub-subsection headings. They may appear at the beginning of a paragraph,or even in the middle of a paragraph. They are often followed by punctuation.

  • Include the punctuation within the markup. This may appear to conflict with DP rules, but it is actually a valid DP exception to the general DP markup rules.
  • There is a special rule for smallcapped imbedded headings in National Park Service booklets: put them on a separate line, and space like a DP section heading.

Formatting Tables (change from DP guidelines)

In my projects, everything (including tables) is first converted to HTML, then expanded/formatted as text. Because of this, tables do not need to be vertically-aligned; table cells need very badly NOT to be wrapped. These rules should save time for everyone.

  • Single-space tables: no blank lines between cells.
  • Do not draw boxes with dashes, plus signs, etc.
  • Do not try to vertically-align cells.
  • Especially, do NOT add leading spaces to a line to right-justify the first cell.
  • Unwrap and do not rewrap table lines. Each line of a table should be contained in one line of text, no matter how long it is.
  • Within a line, separate cells by either a "|" character or two-or-more spaces. (spaces are optional before or after a "|")
  • Indicate blank cells by consecutive "|" characters: "||" indicates one blank cell, "|||" indicates two blank cells, etc.

So long as these rules are observed, it will not cause actual harm to use the more-than-two spaces to approximate vertical alignment for convenient proofing. But your vertical alignment will have no effect on the text versions. The text versions will be formatted based solely on HTML and stylesheets.

PGDP-CANADA TABLES WITH PPGEN

Although I do not use PPGEN, feel free to follow these rules, with the additional flexibility I allow:

  • Before each table, include a blank line, then a line with only the HTML tag <table>
  • Single-space tables (as I always request)
  • After the table, include a line with an ending table tag, </table>, than another blank line.
  • It will not cause harm to follow the rigid "singlespace-vertical bar-singlespace cell separation rules, but it is not necessary.

OUTLINE TABLES

In some tables, the first column is indented like an outline. Use leading spaces as if formatting an outline (see below).


IMPLEMENTATION

My automatic tools for processing tables are described in User:Hutcheson/Postprocessing Tools/Table Support

Formatting the Table of Contents (change from DP guidelines)

DEFINITIONS: a "Compact" Table of Contents includes, conceptually, one line for each section. There may be some lines that need to wrap; there may also be some lines that are differently-justified (say, centered). Compact TOC's may be numbered, and may be in outline form.

An "Antique" Table of contents typically comprises (or includes) a whole "summary paragraph" for each section.

FORMATTING A "COMPACT" TOC: Treat this mostly like a generic table: two or three columns separated by three-or-more spaces. Note that indentation is required for outline-format tables, and may not be used for any other purpose: in particular, NEVER NEVER indent to right-justify section numerals!

Even numbers of leading spaces are reserved for levels of indentation; ODD numbers of leading spaces are reserved for other purposes. For instance:

  • NEVER NEVER NEVER use dots or dashes to fill in the space between chapter heading and page number; if the printed copy has fill characters, delete them.
  • A CENTERED line (without a page number) should have THREE leading spaces.
  • A RIGHT-JUSTIFIED line (or a line that is otherwise formatted uniquely) should have ONE leading space.
  • Did I mention how extremely important it was to NOT use leading spaces to right-justify section numbers?
 CHAPTER    PAGE
   PART THE FIRST (CENTERED LINE)
I.    The Beginning     1
II.   The First Continuation     23
III.    Continued     37
  IIIa.   An Interlude    44
  IIIb.   Reversion to the Main Theme    45
   PART THE SECOND (ALSO TO BE CENTERED)
IV.   The Conclusion    55

If all chapter titles are the same format (often italics or small caps), do NOT use markup to indicate that. But if individual words or phrases within some titles are different font than the rest of the title, DO mark those words. (This will make all our lives easier.)

FORMATTING AN ANTIQUE TOC

In an antique table, each separately-formatted entity should be a separate paragraph. Use the usual three-or-more-spaces rule to mark off page or paragraph numbers. If the first line of a chapter entry looks like a single line of a compact TOC, format it like a one-line compact TOC.

I.   CHAPTER THE FIRST

The background; the participants; the planning, scheduling, and concomitant
problems; gathering, departing, and routine travel to the beginning of unknown regions;
thoughts of the author; etc.     4

II.   CHAPTER THE SECOND

The base camp; altercations with the natives; construction difficulties; weather challenges;
Initial survey results; flora and fauna of the river valley; a few observations upon
politicians and other poisonous reptiles; meteorological baselines    87

Formatting an Index (change from DP guidelines)

  • Dewrap, as in the DP guidelines.
  • Use an even number of leading spaces to indicate subentries, as in the DP guideines.
  • Single-space, except between initial letters of the alphabet (so an index will look like 26 verses of poetry). Do not double-space between index entries.
  • For a left-justified index, put a comma and space between the entry and the page number (which probably matches the scan.)
  • For a left-and-right-justified index, leave two or more spaces between the entry and the page number.
    • Do not add a comma to convert left-and-right-justified index to the DP form.
    • Remove filler characters--lines of dots or dashes included in the print version to fill out the line. In HTML, you cannot know how long the line is.

Formatting Species Keys (addition to DP guidelines)

DEFINITION A species key is a decision tree in outline form. Each line in the outline is a list of characteristics (a "rule"), which either defines a particular "type" (genus or species), or is subdivided further. Keys look complicated, but can be formatted easily, and postprocessed completely automatically, using these simple rules.

  • Include the entire key in poetry markup.
  • Single-space the key--no blank lines between entries anywhere.
  • Unwrap each line, no matter how long. The rule is, one line one rule.
  • Indent exactly 2 spaces for each level of the outline.
  • Put species identification on a separate line, indented exactly one space. (These are usually right justified in the printed edition.
  • Mark internal links to pages or illustrations (see above)
  • Within the species identification line, put about three spaces (exact # irrelevant) between species name and page reference.
  • Don't get fancy with line numbering. Especially especially NEVER right-justify the line number. It's much more important to get the indentation right than to incorrectly second-guess the way the postprocessor will treat line numbers. (Those are the only two possibilities!)
  • Apply inline markup, bold, italic, etc., but ONLY where it doesn't invariably follow from the layout. For example, don't put italics markup on species name or page reference.
  • Beyond this, DP doesn't have formatting tools to describe anything more about the layout of the key; don't try to invent them. Postprocessors have good tools to globally handle layout; don't go to a lot of work to thwart them.

A simple example follows, and keys are _always_ this simple!

I. Lots of legs, hard skeleton, small creatures that squish messily when trod upon........
  A. <i>Six</i> Legs
 Bug  see [page 7]
  B. <i>Eight</i> Legs
    1. Long tail
 Scorpion  see [Color Plate X]
    2. No tail
 Spider   see [Chapter 7]

Indentation is always either one space (for species name) or an even number of spaces. The species name and plate reference have no markup, because it's safe to assume they'll always be marked up the same way. The number of legs IS marked italic, because there's no pattern that can be applied automatically within a rule.

Formatting Trees and Outlines (addition to DP guidelines)

Nearly always, the useful representation of a Tree structure is as an outline:

Trunk
  Main branch 1
    Side branch 1
      Twig 1
      Twig 2
    Side branch 2
...
  • Single space the lines.
  • Include the entire tree in poetry markup.
  • Unwrap each entity, no matter how long the line becomes (this will seldom be a problem in practice.)
  • Always use exactly two additional spaces of indentation per level. DO NOT adjust indentation spaces to adjust the position of branch numbers.

Some tables (such as geologic columns) present combination of table and outline information in a compact way. These usually end up as graphics in the HTML version, but it's nice to have something in the text version.

Think of the table as an interleaved outline. Typically geologic eras/epochs form one pure outline; and geologic formations/members form another pure outline; the challenge is that the two outlines don't sync perfectly. No matter.

Define indentation based on columns of the table--in most columns, most cells are empty.

Each vertical column represents a level of the "combined outline". Each table cell that might span multiple lines of the table goes on a separate line of the outline. And after the "outline" columns, come columns of actual data--member name, description, location, depth; which can be treated as the usual tables==unwrap each cell making lines as long as necessary, leave vertical bar or at least 3 spaces between cells, etc.

Quarternary
  Holocene
  Pleistocene
      Redwall (formation, overlaps tertiary era)
    Philistocene
        Pinkwall Member           Lightly stained sandstone    Moab, Utah (part of Redwall)
    Pelethocene
        Maroon Cliffs Member      Dark sandstone        Flagstaff, Arizona
Tertiary
  Incendiary
    Cherocene
        Rust Scarp Member     Sandstone with Siltstone layers    Pittsburgh, Pennsylvania (still part of Redwall)
    Anthrocene 
      Dover Chalk              White limestone           Mineral Wells, Texas (finally, out of the Redwall!)
 ...


Unwrapping lines is critical. Get line order correct: put "Quaternary" above "Holocene" because it is a higher level of outline: do NOT put "Quartenary" on the line between "Holocene" and "Pleistocene" even if the word is centered vertically in its cell in the table. And don't obsess over the indentation level; get the order and unwrapping right, and all else is easy to fix.

Formatting Mostly-Outline Tables: an algorithmic approach (addition to DP guidelines)

This is a simple series of steps for converting a table with lots of row-spanning cells into a text outline. It is not a contradiction of the preceding section, but a series of simple steps for getting the desired effect. Also: it's easy to convert back to table format if the postprocessor prefers that.

  • Place the caption (heading) for each column on a separate line, dewrapped, each one indented two or four spaces further.
  • Dewrap each cell on its own separate line.
  • Arrange the cells in order from top to bottom left to right and. The top-to-bottom order should be based on where the top of the cell is, not where the words in the cell are. At each horizontal line in the table, order the cells from left to right, keeping each cell on its own separate line.
  • Indent each cell according to its column.

At first glance, the result may look a little odd, almost but not quite like an outline. With very little practice, it's easy to reconstruct the original table almost exactly. And it scrolls much more gracefully. However, the HTML version usually needs an image of the original table, because some detail (like proportionate size of cells) can't be represented in the text outline.

Formatting Text-heavy Tables (addition to DP guidelines)

This approach is useful for tables whose cells contain long paragraphs or lists.

  • Format the table like a chapter, with a two-line break between each row.
  • Format the individual cells like paragraphs, lists, whatever; with a one-line break between each paragraph.
  • If there are paragraph breaks within a cell, separate them with a [** par] note by itself on a line. No need to be verbose.
  • Put the whole table in /# ... #/ markup to indicate that special processing needs to be done.

Proofing and Formatting Illustrations (change from DP guidelines)

Proofing Charts and Diagrams With Internal Text

Please transcribe any text within the diagram or illustration if it might be meaningful to someone who doesn't have the diagram. This text would be included in the text version. Text may be scattered around the diagram--just type each bit of text on a separate line, and let the formatters or postprocessor sort it out.

Formatting Illustration Internal Text

Internal text in a diagram is NOT part of the caption; it should follow the caption, in "block quote" markup.

  • Format it like a paragraph, list, or outline if it almost fits; otherwise leave as an unformatted singlespaced list.
  • Mark italics, bold, small caps, as usual.

Formatting Illustration Captions

  • Flag "location portions" of captions with [**omit] notes. Since the illustrations and captions cannot be the same two-dimensional grid in a flowed/rewrapped layout, these locations will be misleading and need to be reviewed in postprocessing.
  • Use separate [Illustration] markers for each picture, even if the captions are combined into one paragraph in the printed copy. Use [**?] markup for anything that doesn't go with a particular picture.

Example caption:

Examples of mythical monsters. (top right) Unicorn. (top left) dragon. (bottom) honest banker.

Formatted caption:

[**?]Examples of mythical monsters.

[Illustration: [**omit](top right) Unicorn.]

[Illustration: [**omit](top left) dragon.]

[Illustration: [**omit](bottom) Honest banker.]

Formatting Right-Justified Snippets (addition to DP guidelines)

Attributions, signatures, and other short snippets are frequently right-justified. These can be authors of poetry or citations, sources of photographs, signatures in letters, etc.

Right-justified snippets should be:

  • ON A SEPARATE LINE
  • PRECEDED BY A SINGLE SPACE
  • DEWRAPPED

Caveats:

  • This convention works inside or outside of poetic blocks; don't start or terminate a poetic section simply because there is a right-justified line.
  • This convention does not apply within section headings. All headings of a particular level need the same treatment (center, right or left-justification); that is handled by stylesheets.

Dewrapping is rarely needed, but occasionally a long attribution will need to be both re-wrapped and right-justified. HTML will rewrap as necessary, and the text version will be re-wrapped automatically--but only IF everything is on a single line.

Formatting Hymnals

Hymnals have their own idiosyncracies.

There is a page describing hymn meters Hymn Meters, but each hymnal will probably need specific instructions: see the project comments.

Formatting National Park Service "Uniformat" Booklets

Recent NPS booklets are created with a standardized stylesheet. They look complicated, but I've found easy ways to handle many common complex-looking forms: see Formatting NPS Uniform Booklets.

Formatting Threadcraft Booklets

The various needle crafts each have their own unique abbreviations and layout issues; these are "differently easy". See Formatting Threadcraft Pattern Books.

Formatting Cookbooks

Proofreading and formatting guidance. See /Cookbooks