HTML5draft

From DPWiki
Jump to navigation Jump to search
User - Post-Processing and Post-Processing Verification

Project Gutenberg now recommends HTML5 project uploads

Project Gutenberg recommends using HTML5 for new submissions. Other HTML version files will be converted to HTML5. HTML5 benefits from improved automated validation checks, and improved conversion to the newer EPUB3 format.

HTML5 uploads must meet the following requirements:

  • Be UTF-8 encoded.
  • Pass W3C validation as HTML5.
  • Only use CSS3 that shows status=REC on this site.
  • Do not use CSS Custom Properties (also known as CSS Variables) - these are not supported in some ebookreaders.
  • Do not use the following new HTML5 tags:
    • <aside> - Defines some content loosely related to the page content.
    • <audio> - Embeds a sound, or an audio stream in an HTML document.
    • <bdi> - Represents text that is isolated from its surrounding for the purposes of bidirectional text formatting.
    • <canvas> - Defines a region in the document, which can be used to draw graphics on the fly via scripting (usually JavaScript).
    • <data> - Links a piece of content with a machine-readable translation.
    • <datalist> - Represents a set of pre-defined options for an <input> element.
    • <details> - Represents a widget from which the user can obtain additional information or controls on-demand.
    • <dialog> - Defines a dialog box or subwindow.
    • <embed> - Embeds external application, typically multimedia content like audio or video into an HTML document.
    • <hgroup> - Defines a group of headings.
    • <keygen> - Represents a control for generating a public-private key pair.
    • <main> - Represents the main or dominant content of the document.
    • <mark> - Represents text highlighted for reference purposes.
    • <menuitem> - Defines a list (or menuitem) of commands that a user can perform.
    • <meter> - Represents a scalar measurement within a known range.
    • <nav> - Defines a section of navigation links.
    • <output> - Represents the result of a calculation.
    • <picture> - Defines a container for multiple image sources.
    • <progress> - Represents the completion progress of a task.
    • <rp> - Provides fall-back parenthesis for browsers that that don't support ruby annotations.
    • <rt> - Defines the pronunciation of character presented in a ruby annotations.
    • <ruby> - Represents a ruby annotation.
    • <source> - Defines alternative media resources for the media elements like <audio> or <video>.
    • <summary> - Defines a summary for the <details> element.
    • <svg> - Embed SVG (Scalable Vector Graphics) content in an HTML document.
    • <template> - Defines the fragments of HTML that should be hidden when the page is loaded, but can be cloned and inserted in the document by JavaScript.
    • <time> - Represents a time and/or date.
    • <track> - Defines text tracks for the media elements like <audio> or <video>.
    • <video> - Embeds video content in an HTML document.
    • <wbr> - Represents a Word Break Opportunity, where it would be ok to add a line-break. May be supported soon.
  • Do not use the following HTML4/5 tags:
    • <button> - Creates a clickable button.
    • <fieldset> - Specifies a set of related form fields.
    • <form> - Defines an HTML form for user input.
    • <iframe> - Displays a URL in an inline frame.
    • <input> - Defines an input control.
    • <legend> - Defines a caption for a <fieldset> element.
    • <map> - Defines a client-side image-map.
    • <menu> - Represents a list of commands.
    • <noscript> - Defines alternative content to display when the browser doesn't support scripting.
    • <object> - Defines an embedded object.
    • <script> - Places script in the document for client-side processing.
  • However, note that the following new HTML5 tags are supported by ebookmaker and can be used:
    • <article>Defines an article.
    • <figcaption>Defines a caption or legend for a figure.
    • <figure>Represents a figure illustrated as part of the document.
    • <footer>Represents the footer of a document or a section.
    • <header>Represents the header of a document or a section.
    • <section>Defines a section of a document, such as header, footer etc.
    • HTML5 structured table elements (with <tfoot> after the <tr> elements instead of before as required in HTML4)
    • Property attribute of <meta> elements

Using Preview in the PG upload form will help you to check some of the above requirements before you submit.

Can I still upload XHTML 1.0/1.1 (i.e. HTML4) as I used to before the move to HTML5?

Although Project Gutenberg recommends HTML5 uploads, you can still upload XHTML 1.0 Strict or XHTML 1.1. These uploads will be accepted and EPUB2s (old EPUB format) will be generated, as always. It is best (and relatively simple), however, to submit your project to Project Gutenberg in HTML5. Project Gutenberg's automatically-generated HTML5 is not a good direct substitute for your HTML4 file--there are anomalies that would need to be fixed by hand.

If you are not submitting an HTML5 file, note that online ebookmaker will run the Nu validator on the HTML5 file ebookmaker generates from your submitted file. Please check for any errors here and adjust your submitted file accordingly.

Why HTML5?

Choosing HTML5 allows the post-processor to immediately take advantage of the Nu HTML Checker which is significantly better than the validator that was available for HTML4. To the PPer, it's the same W3C HTML checker url; when HTML5 is submitted, the validator automatically switches to the improved "nu" validator. HTML5 also translates readily into EPUB3 (newer EPUB format).

Longer term, moving to HTML5 will allow the Post-Processor to use features of HTML5 that enhance the final product posted for download at PG, one of which is potential support of assistive technology using HTML5 with ARIA.

What is XML serialization, and why are we no longer using it?

For a short period, the recommendation was that submitted files should be HTML5 with XML serialization, meaning all elements needed to be closed. However, W3C are no longer recommending or maintaining the specification of this type of file. In addition, the HTML validator will now give an info message for each closed void element when files of this type are checked.

Elements that were commonly closed, but should no longer be closed

Void elements are those elements that cannot have content. These would previously have been closed with a slash to satisfy XML requirements, but when submitting HTML5, they should no longer contain the slash. The validator will issue warnings if the slash is included. Commonly used examples are:

<hr/> -- use <hr>
<br/> -- use <br>
<img ... list of attributes ... /> -- use <img ... list of attributes ...>
<meta ... list of attributes ... /> -- use <meta ... list of attributes ...>
<link ... list of attributes ... /> -- use <link ... list of attributes ...>

Preparing HTML5 for Project Gutenberg

Overview

As well as improved long-term support and availability of HTML5 features and tags, benefits to submitting HTML5 include that your file will not require a conversion to HTML5 by ebookmaker, and also that an EPUB3 will be posted to PG.

For the purposes of this document, these names will be used:

HTML5 is often called just "HTML"; however, here we will use "HTML5" to differentiate it from other variations. HTML4 describes the traditional DP markup, which is often XHTML 1.0 Strict or 1.1 though it could be some other pre-HTML5 variation. Whichever variant you are submitting, your HTML should be well-formed, i.e. all elements with content, such as <p>...</p> should be closed. Although technically this is optional in HTML5, it is recommended good practice and will make it much easier to maintain.

The Process

Creating your HTML5 Version manually or in Guiguts

Post-Processors already skilled in HTML5 of course are welcome to simply produce their HTML5 according to the standards below.

It is now also possible to create an HTML5 file directly from Guiguts. Guiguts 1.5.0 will generate HTML5 without XML serialization, and includes the necessary updated HTML and CSS checkers. For information about downloads, please visit this forum page. Please also check that your output follows the standards below.

Starting with HTML4

It is possible to start with a file prepared using traditional DP processes and tools and convert this to HTML5. This original file will typically be an XHTML 1.0/1.1 file, which is HTML4. If that file is correct, then it will have XML serialization, which is not wanted in the HTML5 version. There may therefore be quite a lot of edits needed to remove closing slashes from elements such as those described in the Elements that were commonly closed, but should no longer be closed section.

As we progress with HTML5, we plan to update our ppgen tool so it can generate HTML5 output.

Header for HTML5

Here is a skeleton of the HTML that should be in the HTML5 header:

1: <!DOCTYPE html>
2: <html lang="en">
3: <head>
4:    <meta charset="UTF-8">
5:    <title>Loco or Love, by W. C. Tuttle—A Project Gutenberg eBook</title>
6:    <link rel="icon" href="images/cover.jpg" type="image/x-cover">
7:    <style>
8:    ... CSS ...
9:    </style>
10: </head>
11: <body>

Comments, by line number, follow.

(1) This is the standard header, where "html" is understood to be HTML5.
(2) This line specifies the language code: please use only ISO 639-1 language codes. Choose one from the list here.
(4) This line specifies the file is in UTF-8. EPUB3 for PG requires UTF-8.
(6) The name of the cover image goes in the href. Please use this format.
Non-Breaking Spaces

Using &nbsp; or &#160; for non-breaking spaces is a good idea as opposed to using the non-breaking space UTF-8 character, since that UTF-8 character looks identical to a regular space, and invisible characters can be difficult to troubleshoot.

Other HTML5 changes from HTML4

Most of your HTML4 code will work as HTML5 with a few common exceptions. For example, HTML4 requires a summary attribute on a table while HTML5 forbids it. When you validate your file, the error message(s) or warnings you receive are usually sufficient to see what needs to change.

ppgen and HTML5

Currently, our ppgen tool does not produce HTML5. We hope to change that sometime this year. In the meantime, there are several search-and-replaces that can be used to convert the ppgen-generated HTML to HTML5. For more information, please read Converting ppgen files to HTML5 with XML Serialization section of the ppgen manual, which will be updated in due course to remove the XML serialization aspects.

To comment or request edits to this page, please contact jjz or windymilla.

Return to DP Official Documentation Menu