PPTools/Ppgen/Markup Overview

From DPWiki
< PPTools‎ | Ppgen

A Simple Markup for Postprocessing

Post-processing has been difficult for many people because of the HTML required for books produced at DP. With new considerations added for handheld formats, the task is even more daunting. I set out to see how simple a markup I could create that would allow post-processing without having the PPer even look at the HTML. A single source file would generate the Text and HTML version and the HTML would be PPV-approved to DP standards.

At all stages, I kept the needs of a beginning post-processor in mind. The first choice was whether the markup should be “presentational” or “semantic.” A presentational markup describes what something looks like--how it is presented. A semantic markup describes what something is and lets the generator decide how it should look.

Presentational markup uses a simple vocabulary to describe how it looks, such as margin changes, indents, floated blocks. There is no keyword for “poetry,” for example. That's what something “is.” Without having to define a semantic name for every construction, the markup allows the user to decide how a document will look with much more flexibility.

A piece of software takes a book marked up in this simple format and produces Text and HTML. The HTML is compatible with ebookmaker for DP/PG use. The text meets the usual standards for line length, wrapping, chapter heading spacing, etc. The name of the software program is “ppgen.” It is available either as a downloadable program to install on a user’s computer or it can be accessed online.

Basic Markup

Borrowing ideas from a presentational markup system of long ago, most tags for ppgen start with a period on a line of their own. Here's a simple example:

This was the invitation we sent. It’s kind of
crazy, but what did we care, because in my patrol
we’re all crazy anyway. We ought to be called
the Squirrels instead of the Silver Foxes, because
we’re all nutty.
.ce
Scouts, Attention!
Shoulder your trusty appetites and march to Bridgeboro on
Saturday next, April 17th, to reënforce your brother scouts
of the 1st Bridgeboro troop in a daring enterprise. Come
hungry! Don’t eat on the way! Rally in Downing’s lot near
Bridgeboro Station at 10 A.M.

That “.ce” tag tells ppgen to center the next line. (Note: if you have a block of more than 1 line you could instead use the ".nf c" tag, explained later.) The result looks like this:

Rfrank-06220320.png

That's a good start, but in the book that centered line is in small caps. Use inline markup to indicate small caps, like this:

.ce
<sc>Scouts, Attention!</sc>

There are many inline markup tags available. Some may be familiar, like <i> and <b> but others may be new, like <g> for gesperrt or <s> for a smaller font size. With the addition of the small-caps, we now have this:

Rfrank-06220325.png

That paragraph following the centered line is indented from the left and right margins in the original book. We change the left margin indent with the “.in” directive and the line length with the “.ll” directive. Let's move the left margin in 4 characters and shorten the line by four characters. We'll add the paragraph that follows to see how it all lines up. Here's the source:

This was the invitation we sent. It’s kind of
crazy, but what did we care, because in my patrol
we’re all crazy anyway. We ought to be called
the Squirrels instead of the Silver Foxes, because
we’re all nutty.
.ce
<sc>Scouts, Attention!</sc>
.in +4
.ll -4
Shoulder your trusty appetites and march to Bridgeboro
on Saturday next, April 17th, to reënforce your brother
scouts of the 1st Bridgeboro troop in a daring enterprise.
Come hungry! Don’t eat on the way! Rally in Downing’s
lot near Bridgeboro Station at 10 A.M.
.ll
.in
I guess when they got these invitations they
thought we were all maniacs from Maine, hey?
What did we care? Not in the least, quoth we.

Notice we used just an “.ll” tag to return to the previous line length and the same thing for the left margin with the “.in” tag. An “.in -4” would have done the same thing. Here's the result now:

Rfrank-06220327.png

There was one more thing at this place in the original book. A small poem came after the indented block. Note the text in that block wrapped within the margins provided. For the poetry, there is no wrapping. What we want is a block of lines that is not wrapped (poetry), and that is centered as a block. We use “.nf b” to start that block (“no fill, block”) and “..” to end it. Here's what it looks like in source:

This was the invitation we sent. It’s kind of
crazy, but what did we care, because in my patrol
we’re all crazy anyway. We ought to be called
the Squirrels instead of the Silver Foxes, because
we’re all nutty.
.ce
<sc>Scouts, Attention!</sc>
.in +4
.ll -4
Shoulder your trusty appetites and march to Bridgeboro
on Saturday next, April 17th, to reënforce your brother
scouts of the 1st Bridgeboro troop in a daring enterprise.
Come hungry! Don’t eat on the way! Rally in Downing’s
lot near Bridgeboro Station at 10 A.M.
.ll
.in
.nf b
Ask not the reason why
Here’s but to do or die.
Hark to the battle-cry
Failure or apple pie!
Come, valiant comrades!
..
I guess when they got these invitations they
thought we were all maniacs from Maine, hey?
What did we care? Not in the least, quoth we.

The result is shown here:

Rfrank-06220335.png

Now we can compare it to the original as printed in the book:

Rfrank-06220336.png

Presentational Markup

The original book used indents on the first line of each paragraph. That isn't the DP style so it was not duplicated here, though a simple “.ti +2” would have had the same visual effect. There appears to be an extra space above and below the “invitation.” That could be duplicated with the “.sp” dot directive (for a vertical space). The point is to show that presentational markup gives the post-processor much more flexibility than semantic markup. What would you call that structure if you had to give it a semantic name? I'd rather describe essentially what the typesetter had to do to print the original book. That's the point of the “dot directives.” No line should ever start with a “.” followed by a letter, so the ppgen generator uses those “dot commands” to adjust what it is doing, just as the original typesetter did. The result is a smaller and more natural vocabulary for the PPer to learn, which makes post-processing easier, less error prone, and more fun.