Flavours of (X)HTML

From DPWiki
Jump to: navigation, search
DP Official Documentation - Post-Processing and Post-Processing Verification

Prefatory notes: Many flavours of (X)HTML

Web pages come in many different flavours. Like any language, HTML has evolved over time, resulting in a number of dialects. These dialects are related, but are all slightly different. And each one has its own standard for how to code the web pages that are displayed in our browsers.

For our purposes, the most common of these standards are: HTML 4.01 Strict, HTML 4.01 Transitional, XHTML 1.0 Strict, XHTML 1.0 Transitional and XHTML 1.1. This document explains the differences between these versions—and why XHTML is a better choice than HTML for most books.

Why conform to a standard?

Even if a page doesn’t conform to one of these standards, browsers usually do their best to display it anyway. The result, however, is not always what the author intended.

Coding to a standard makes it more likely that all browsers will interpret and display your web page in a predictable fashion, and that you will be able to anticipate how it will convert into “mobile” formats using PG’s ebookmaker.

PG accepts more than one flavour of (X)HTML for e-books, but only accepts code that validates to a standard.

To find out whether your (X)HTML file validates, upload it to the W3C Markup Validation Service. For a file to validate, it must be coded according to the rules set out in the standard you have chosen. If, however, you’ve broken a rule or two along the way, the validator will helpfully provide you with a list of errors. Sometimes, fixing the first error or two will get rid of most of the errors. Sometimes, you have a lot of instances of the same error, and you’ll need to fix all of them and re-upload your file to the validator.

What are the differences?

The document type declaration (DOCTYPE), which has to be the first line in our documents, tells the browser which flavour of (X)HTML you are using, and against which standard it should be validated. Aside from this, the differences among the standards fall into three categories:

  • XHTML vs. HTML
  • Transitional vs. Strict
  • XHTML 1.0 vs. XHTML 1.1

XHTML vs. HTML

XHTML and HTML are both descendants of the same parent language (SGML), but XHTML is also a descendant of XML (eXtensible Markup Language). Because part of its ancestry is SGML, it is very similar to HTML 4.01; it contains all the same elements and attributes, and comes in the same flavours. The two have a few important differences, however:

  • Element and attribute names must all be in lowercase in XHTML. In HTML, uppercase is allowed. So, in XHTML, you have to use <p>, while in HTML <P> would validate.
  • All elements must be properly closed in XHTML:
    • Every element that can hold displayable content (non-empty elements) must have an opening and a closing tag in XHTML. In XHTML, you have to use <p></p> for a paragraph element; but in HTML, <p>, alone, is acceptable.
    • Empty elements must also be closed in XHTML: for example, <br/> for a line break. This is also known as a self-closing tag. In HTML, a line break is written as <br>.
  • Attributes must have explicitly assigned, quoted values in XHTML; in HTML, quotes are not mandatory for all attribute values. In HTML, rowspan=3 may be used, but in XHTML, rowspan="3" or rowspan='3' is required. The quotes are required in both flavours for attribute values containing spaces, like alt="this is alt text".
  • Elements must be properly nested in XHTML. If the opening tag for an element is inside another element, then the closing tag must be inside that same element (e.g., <b><i></i></b> is properly nested; <b><i></b></i> is not).
  • If you want to declare a language for a particular section of text, or for the whole document, note that:
    • In HTML, the lang attribute is used;
    • In XHTML 1.0, you must use xml:lang in addition to lang, e.g., <i xml:lang="de" lang="de">Guten Tag!</i>;
    • In XHTML 1.1, only xml:lang is used.
  • In XHTML, the name attribute of the <a> element is deprecated, i.e., due to be discontinued. Use the id attribute instead. In HTML, both may be used. See <a href='#xhtml1-vs-xhtml-11'>XHTML 1.0 vs. XHTML 1.1</a> for more information.
  • The <html> opening tag must contain xmlns="http://www.w3.org/1999/xhtml" in all versions of XHTML.

Transitional vs. Strict

For our purposes, both HTML 4.01 and XHTML 1.0 come in two flavours, Transitional and Strict. Transitional allows use of some elements and attributes that are deprecated, which Strict does not allow. Every element and attribute that is contained in Strict is also contained in Transitional. Everything that is missing from Strict is better handled by CSS. You can check which elements are contained in Strict and which only in Transitional in the list of elements at W3Schools.

XHTML 1.0 vs. XHTML 1.1

XHTML 1.1 differs from version 1.0 mainly in that it does not contain the elements and attributes that are considered deprecated in XHTML 1.0 and HTML 4.01. But since everything deprecated was not allowed in XHTML 1.0 Strict, XHTML 1.1 hardly differs from XHTML 1.0 Strict. There are only 2 relevant differences between XHTML 1.0 Strict and XHTML 1.1:

  • The lang attribute was removed in favour of xml:lang. So, instead of, e.g., lang="en", you have to use xml:lang="en".
  • The name attribute for anchors was removed; instead, the id attribute has to be used. This makes creating links more flexible, since you are not restricted to <a> elements any more, but can add an id to any element.

What is the advantage of XHTML?

If the e-book as viewed in a browser were the only concern, all standards would be about equal. Since XHTML is a bit more rigid when it comes to document structure, it leaves less room for ambiguous behaviour in different browsers.

Regardless of the input file’s document type, PG’s ebookmaker uses a subset of XHTML 1.1 to generate the files for “mobile” devices—which means that it must convert anything that is not already XHTML 1.1. The closer the e-book is to XHTML 1.1, the less will be changed when the “mobile” formats are generated. For example, everything that is only allowed in (X)HTML Transitional must be changed in some way. These things are changed by ebookmaker, according to its own set of rules. The more that has to be changed, the higher the risk of breaking something. Because HTML has to be converted to XHTML and Transitional has to be converted to Strict, the best choice is either XHTML 1.0 Strict or XHTML 1.1.

Appendix: Document Type Declarations

Each document has to start with a document type declaration, which has to be the very first thing in the file. The available document type declarations (also listed at W3Schools) are:

XHTML 1.1:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

XHTML 1.0 Strict:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

XHTML 1.0 Transitional:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

HTML 4.01 Strict:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

HTML 4.01 Transitional:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">


To comment or request edits to this page, please contact lhamilton or hdmtrad.

Return to DP Official Documentation Menu