PPTools/Ppgen/FAQ
Ppgen Frequently Asked Questions
I get Tidy Errors or my HTML doesn't validate. What can cause this?
Ppgen attempts to generate HTML that validates XHTML 1.0 Strict. If it doesn't, either there's a bug, there's an error in the PPer's testing, or the PPer has escaped out of ppgen-generated code and injected something that doesn't validate. Bugs are fewer and fewer but if you suspect one, PM or email Walt Farrell (wfarrell) or report it in either of the ppgen forum discussions. Some PPer's check their HTML with Tidy, which I highly recommend, but remember ppgen can work in UTF-8. If it has generated HTML with UTF-8 encoding based on your source file coding, be sure to use the correct form for Tidy:
for UTF-8: tidy -e -utf8 filename.html for Latin-1: tidy -e filename.html
How can I check links in my generated HTML file?
To use the W3C Link Checker, you need to have the HTML file and the images folder available as a URL. Ppgen does provide a basic level of link checking as part of its processing, but if you want to use the official checker you'll need access to a web site where you can upload your project, or you'll need to install the official checker locally on your machine.
You can find local install instructions [here]. It will require that you have Perl installed.
If you have DU you might consider using the "preview" function of the PG upload site to do your link checking.
Are there any debugging switches I can use with the ppgen generator?
You can find the complete list of command line options to ppgen by running the command:
python3 ppgen.py --help
If you run that, you will find there are four debugging options. One of them can be very useful. Here is the complete list:
"d" print names of selected generator routines as they are invoked. "s" keep generated styles, do no run class composer "a" show all lines as processed "p" display detected processor type
The third one, invoked with ppgen.py -d a -i filename-src.txt, is very helpful when the program crashes with a confusing error message. By looking at how far it got, the PPer can see within a line or two the source file line that caused the crash.
I use guiguts to create my source file. Do I need to rewrap the text?
You do not. The text file or files that will be generated are wrapped to DP/PG standards. The wrapping algorithm is sufficient but not elegant. You may find some paragraphs do not wrap at optimum boundaries. If you care about optimizing the paragraph wrapping in the text versions, feel free to edit the generated files.
Ppgen provides another mechanism if you don't want to edit the generated text file to take care of an extremely short line. You can wrap the text manually if it's important to you. This particular source causes a very unattractive short line in the generated text:
“Not only Dad, but the whole shooting match on the ranch. Tell you what, Aunt Belle and Uncle Fent said we could stay as long as we like, and they meant it, even if we are boys. Let’s organize a secret—s-e-c-r-e-t—mind you, detecting bureau, or what ever it is, and stay until we solve the three mysteries!” Bob proposed.
Since you send the text and HTML files forward to PPV or to PG, you can edit the text file to improve the text wrapping on that one paragraph. But if you do, you may end up doing it over and over. One solution available to you is to manually wrap that paragraph and include it as literal text:
.if t .li "Not only Dad, but the whole shooting match on the ranch. Tell you what, Aunt Belle and Uncle Fent said we could stay as long as we like, and they meant it, even if we are boys. Let's organize a secret—s-e-c-r-e-t—mind you, detecting bureau, or what ever it is, and stay until we solve the three mysteries!" Bob proposed. .li- .if- .if h “Not only Dad, but the whole shooting match on the ranch. Tell you what, Aunt Belle and Uncle Fent said we could stay as long as we like, and they meant it, even if we are boys. Let’s organize a secret—s-e-c-r-e-t—mind you, detecting bureau, or what ever it is, and stay until we solve the three mysteries!” Bob proposed. .if-
The manually formatted first form, inside the ".if t" block, will be copied exactly into the text version of the generated file.
How can I reference a specific place in the text from my TN?
Transcriber's notes often refer to page numbers in the original text. It would look something like this:
page 41: "Afirca" replaced with "Africa"
Your book may be using page numbers, but you want it go to a specific place on the page, so using "#41#" isn't sufficient. Use this:
page #41:tn41#: "Afirca" replaced with "Africa"
and then at the specific place where the correction was made, us this:
“He did not go to <target id='tn41'>Africa,” he said.
Should I use page numbers in my generated text?
You can, but you really need a good reason to do it. I rarely use page numbers in the generated text. There are many reasons.
Page numbers can only appear in the generated HTML version. Plain text and all the tablet variations strip visible page numbers. For tablets, they replace them with their own numbers and those page numbers can change depending on user preferences. One might argue they have references like “see page 181” so they have to have the page number anchors. The trap is to use a reference to a page number, like "#181#", which will indeed point to where page 181 was in the original. But be careful, even if you include page numbers and then hide them, the link to where a page number marker was in the original may actually be on a different page in the reader. The user may spend five minutes looking for a reference that is actually on a different reader page. All of this is avoidable if you simply don't attempt to use page numbers in the generated texts.
Can I make an unusual table with different weight rules using ppgen?
Yes, you can, but not using just ppgen native markup. This is intentional. Ppgen can make simple tables effectively while allowing you to use arbitrary complex tables if you need them.
Let's say you want to make a table with a thick dark line after the first row and a thin grey vertical line to the right of the first column. To do that, code the table in ppgen markup without the horizontal or vertical lines. Generate the HTML file. Then grab the generated HTML that ppgen made for the table and edit it to add borders, cell colors, whatever.
Next comes the magic part. Remove the original table code from your ppgen source file and put in a conditional block that has your customized HTML code.
.if h .li <your HTML code for the table goes here> .li- .if- .if t .li <the plain-text version for the table goes here> .li- .if-
Can ppgen reproduce a letter with a greeting and a salutation?
Ppgen is based on presentational markup. A letter as described is well within the standard markup. Since you control the presentation, you may want to mark it differently depending on if you are in paragraph indent mode or not. In .pi mode, there is no blank space between paragraphs. If you don't add one above and below the letter, it won't look right and likely won't match the original book. Here is a suggested template for a letter in .pi mode:
There was a note on the dresser, and Lawrence read: .sp 1 .in +4 .ll -4 .ti 0 “Dear Lawrence: “There’s no place so safe for a lad of your tendencies as the same cot you are snoring on at this second. I leave you to your dreams and hope they are sweet. As for me, I am pulling down the blinds and disconnecting the telephone, and then I am makin’ off: for I have a pretty idea all of my own. I will see you later. .rj “<sc>O’Brien.</sc>” .ll +4 .in -4 .sp 1 After he had had a meal which was neither breakfast nor luncheon, but combined all the most agreeable
If your book is in standard mode, where paragraphs are separated by a blank line and are not indented, you would omit the termporary indent override and the .sp 1 commands
How do I control font-size changes?
There are two ways described in the ppgen manual. But which one when? If you want to change a font size inline, such as for one line on a title page, use the inline form. In the example code that follows, all markup above the page break (.pb) uses the inline form.
The other way to control font size is to use a dot command. the ".fs 80%" in the following code says "change the font size to 80% until you are told otherwise." Here is an example that uses both inline and command options:
.nf c <xl>The Sky Trail</xl> <i>By</i> <l><sc>Graham M. Dean</sc></l> Author of <i>Daring Wings Circle 4 Patrol</i> <sc><l>The Goldsmith Publishing Co.</l> Chicago</sc> .nf- .pb .fs 80% .nf c <sc>Copyright 1932</sc> <sc>The Goldsmith Publishing Company</sc> Made in U. S. A. .nf- .fs- .pb
How do I incorporate the cover image in my book?
Every book should have a cover image provided, even if it's one you've made yourself. What you do with it in ppgen source depends on if it is the original cover or one you've created. PPers get to decide if the cover image shows in the HTML version. If you want it to show in HTML, simply add these lines at the start of the ppgen source file after the metadata:
.if h .il fn=cover.jpg w=325px .pb .if-
That says "in HTML, put the cover image here and follow it with a page break." Use that for original covers included with the project.
If the cover is not original, you usually do not want it to appear in the HTML but it will show up in the tablet versions. As long as you name it cover.jpg it will automatically be used for the cover image in the tablet versions. Current Project Gutenberg requirements are that if you've made the cover image, you have to put in a disclaimer that the cover image was made by you, placing it in the public domain. You only want that image to show along with the accompanying disclaimer in the tablet version. There are two ways to do this.
The easiest is to include the disclaimer as part of the cover image. An example of that is here. Ask in the Ppgen team thread for Photoshop PSD files to create your own cover.
The other way is to include a very specific transcriber's note at the top of the file. This is the recommended ebookmaker-compatible source code to use for a cover image you've created:
.if h .de div.tnotes { padding-left:1em;padding-right:1em;background-color:#E3E4FA;border:1px solid silver; margin:2em 10% 0 10%; } .de .covernote { visibility: hidden; display: none; } .de div.tnotes p { text-align:left; } .de @media handheld { .covernote { visibility: visible; display: block;} } .li <div class="tnotes covernote"> <p><b>Transcriber's Note:</b></p> <p>The cover image was created by the transcriber and is placed in the public domain.</p> </div> .li- .if-
Though the goal of ppgen is to have the PPer never have to even look at HTML, this PG requirement makes this boilerplate necessary. If you don't understand it, don't worry. Just use it to keep the whitewashers happy.
ASCII, Latin-1, UTF-8? What encoding should I use?
Ppgen can process source files with any of these encodings. Some of the tools PPers use to check the HTML and text files cannot hand some of these encodings. Let's take a closer look.
If your book contains no special characters, then it is probably ASCII. The first 127 characters are common to ASCII, Latin-1 and UTF-8. ASCII source files generate a single text file and an HTML file. The text file will be ASCII but it will be named book-lat1.txt to comply with Project Gutenberg guidelines. Since ASCII is a subset of Latin-1, this is not a conflict. The HTML will be encoded as "charset=ISO-8859-1".
If your book contains characters not in ASCII, then you may be able to choose between Latin-1 or UTF-8, depending on the character. I use this resource to remind myself what is in Latin-1. A special character like the ae ligature is in this list. Please note that files coming from DP are encoded in Latin-1 if there are any characters in the Latin-1 range in the book. A Latin-1 source file generates a single text file and an HTML file. The text file will be Latin-1 encoded and will be named book-lat1.txt and the HTML will be encoded as "charset=ISO-8859-1".
UTF-8 encoding can encode anything, including all the characters in ASCII and Latin-1 as well as characters not available in those two character sets. A good example is a curly quote. If you want to use curly or “smart” quotes, as in the original book, then you need to encode your document in UTF-8. Ppgen is fully UTF-8 compliant and most modern text editors can work with UTF-8 characters. A UTF-8 source file generates a two text files and an HTML file. One text file will be Latin-1 encoded using the standard PG mappings from UTF-8 to Latin-1 and the other will be a UTF-8 text file. They will be named book-lat1.txt and book-utf8.txt. The HTML will be encoded as "charset=UTF-8".
What character set should you use? My advice is always to work in UTF-8. If you set your editor to work with UTF-8 (without a byte order mark, or BOM), then you won't get in trouble. To do this, when you first start a project, download the file from DP (which will be either in ASCII or Latin-1) and immediately save it as UTF-8. You have to do this if you anticipate UTF-8 characters, like smart quotes, and it doesn't hurt to do so even if you don't.
What character set should you send to PPV or PG if you direct upload? You have some choices.
1. If your book has UTF-8 characters other than curly quotes, such as an asterism, you have to send the UTF-8 text forward. As a courtesy, send the Latin-1 text forward as well. This keeps the PPV or WWer from having to generate it. Of course send the HTML and images as well. The HTML will be encoded in UTF-8.
2. If your book has UTF-8 characters just for curly quotes, you could send the UTF-8 text and the Latin-1 text forward, or you could send just the Latin-1. If you are trying to match the look and feel of the original printed text, the curly quotes are most important in the HTML and later derived tablet versions. It's your choice what to send forward for the text version.
3. If you have encoded your book in Latin-1, then send the Latin-1 text forward with the HTML, which will also be Latin-1 encoded. Less than 10% of the Internet is Latin-1 encoded with over 90% UTF-8 encoded. DP has a lot of momentum using Latin-1 so ppgen allows it as the source file encoding.
The short recommendation is this: encode in UTF-8 from the start. If you have characters that are only in UTF-8, like smart quotes, then send both the UTF-8 and Latin-1 text files forward. If you have characters that can be encoded using only the Latin-1 set, then send only the Latin-1 text forward and discard the UTF-8 text. It's easier on the system to do it this way. In a someday world, both DP and PG would use only UTF-8 but that's not happening anytime soon.
Should I use “curly quotes” in my source file?
There is no answer for “should I?” for curly quotes. Different PPers have different and often strong opinions. The important thing is that you can use them if you want to. I convert PG downloaded files to UTF-8 and immediately run the smart-quote conversion program on the source file. It takes about two minutes to resolve everything that the program wasn't sure about and almost always uncovers errors in the downloaded text. From that point on, the quotes match the way the book was printed. I can and I do use smart quotes especially for the HTML and tablet presentations they provide.
You can find a copy of one tool (ppsmq) that will convert your file to use curly quotes [here].
Should I use <i> or <em> markup?
DP formatters use <i> any time italics appear in the original printed text. You can leave those <i> tags alone, and your text will look like hundreds of others.
But with ppgen, you can be more selective. Italics serve many different purposes, and PPers can replace italics with tags that indicate why something was italicized:
- <em> for italics that show emphasis ("It was you, wasn't it?")
- <cite> for italics that are part of a citation or reference ("Dickens in Bleak House described...")
- <i> for italics that are purely decorative, such as in a chapter title or title page.
When italics are purely decorative, ppgen allows you to create italic markup only for formats that can display it natively, such as HTML. That means <em> tags will be represented with _underscores_ in text files, but <i> tags will be ignored; both will be set in italics in html and epub formats.
There are three ways to achieve this.
- Use the tag-i register. Each inline tag, including <i>, has a named register associated with it. If you set this register to "", ppgen will ignore <i> in preparing the plain text version, but will still mark anything marked with <em> or <cite>.
- Use the <I> tag. Each inline tag, including <i>, also has an uppercase equivalent. If you use capital <I> instead of lowercase <i>, ppgen will ignore the capitalized tags when preparing plain text versions, but still mark the lowercase tags. This is a good option if you have only one or two italics you want to be ignored (for example, on a title page).
- Use an .if statement. Of course, you can also achieve the same thing with .if t and .if h if you prefer.
Here's an example:
At the top of the file:
.nr tag-i ""
In the text:
a machine gun bullet. Since you’ve evidently forgotten your whole life between those dates, there’s no reason for treating you now as a dangerous criminal.” // 107.png .sp 4 .h2 id=chXV XV||<i>RED GETS A SHOCK</i> .sp 2 “Listen, Skipper!” pleaded Lieutenant Pennington, seizing Don Winslow’s arm. “Maybe this guy, Count Borg, isn’t nuts; but <em>I’m</em> gonna be if you keep on doin’ and sayin’ things that don’t make sense!
We don't want "RED GETS A SHOCK" to appear as "_RED GETS A SHOCK_" in text as the italics are purely decorative, but we do want the underscores on the emphasised text ("Count Borg, isn’t nuts; but _I’m_ gonna be...") This behavior is a user option.
How can I make a Transcriber's note only appear where it applies?
Ppgen allows conditional inclusion of text. A typical place this is used with TNs is to explain to the reader that the underscores used in the text version, like _this_, are used to indicate text that was in italics or emphasised in the original book. You want this message to show up only in the text versions where the underscores are used. Put this near the top of the source file:
.if t Transcriber's Note: Underscores in the text, like _this_, are used to represent text that was emphasised in the original book. .if-
Does ppgen support creation of .bin files, as Guiguts does? (Important for PPV)
Yes, it does, starting with version 3.42. When you remove the file separator lines, and turn them into either comments (// 001.jpg, etc.) or into .pn commands (or both), you can also insert .bn commands. E.g., .bn 001.png, ..., .bn 005.png, etc.
When it sees the .bn commands ppgen will create a .bin file for each output file it creates, and the PPVers can use those .bin files while they're examining your book, just as if you'd used Guiguts or PPQT to create the files.