HTML and CSS for PPers

From DPWiki
Jump to: navigation, search
DP Official Documentation - Post-Processing and Post-Processing Verification


Prefatory notes

If you’ve never looked at raw HTML code before, it can be overwhelming. DOCTYPE? CSS? Elements and entities?… While there are a lot of resources on the web to help you untangle it all, knowing where to start can be difficult if you don’t really know what you are looking at, or what you should be looking for.

This document introduces you to the basic structure and components of a web page, and familiarises you with some of the code that you will encounter in our e-books. However, it is not meant to be a tutorial on how to actually code a document, nor is it meant to be a guide to Post-Processing.

So, let’s take a peek at a short example e-book, which we will be referring to throughout this introduction.

The document type declaration

The first thing you should see at the top of one of our e-books is the document type declaration, or DOCTYPE:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

This declaration may look complex and confusing, but is just an instruction that tells the web browser what flavour of (X)HTML the page is written in. The example here is XHTML 1.0 Strict. Note that the document type will vary slightly, depending on which flavour you use.

The document type declaration is the only part of your e-book that does not fall within the <html> element. Elements are the basic components of web pages, acting as boxes that hold the content of a page. Like boxes, they come in different shapes and sizes, and can often be nested inside each other in various ways.

You will probably recognise some of the more common elements, like <i></i> and <b></b>, from formatting at DP. Most elements are made up of these paired opening and closing tags, although there are a few that are self-closing. The latter are single tags and can be identified by the / just before the >, e.g., <br/> and <img/>. Unlike elements with paired tags, self-closing elements cannot contain text—they are empty boxes.

N.B.: You may see the self-closing tags with or without a space preceding the closing />, e.g., <br /> or <br/>. The space is not needed, but some very old browsers fail to implement the code properly if the space is not included.

The <html> element and language attributes

In our sample e-book, immediately following the document type declaration is the opening tag of the <html> element. This element is the “biggest” box for every web page: everything except for the document type declaration goes between the opening <html> and the closing </html> at the end of the file.

The opening <html> tag will change slightly, depending on your chosen flavour of (X)HTML; for XHTML 1.0, it should look something like this:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

In addition to the element name, the opening tag here contains three attributes. Attributes provide additional information about HTML elements. They are always specified in the opening tag, and come in name/value pairs: e.g., lang="en".

The first attribute, xmlns="http://www.w3.org/1999/xhtml", is required for all XHTML documents. The last two are both language-related attributes. If you want to declare a default language for your document, you need both “xml:lang” and “lang” for XHTML 1.0. Other flavours of (X)HTML require only one or the other.

Although not required, it is good practice to declare a default language for your file, as it helps browsers, search engines and screen-reader software. The example above is English; if your book is in a different language, you need to change the language code. Thus, if your book is written in French, e.g., you need to change both instances of “en” to “fr”:

xml:lang="fr" lang="fr"

N.B.: If your book has two or more major languages, only one can be declared as default; you can specify the other(s) as needed, either for individual words or for blocks of text, using the “xml:lang” and “lang” attributes on any HTML element.

The <head> element, character encoding and entities

The <head> element is the box holding the elements related to style sheets, metadata, the title of the document, and more. The content of the <title> element is shown only in the title bar of a browser window or tab; none of the other content inside the <head> element is displayed.

In our example, the next line after the opening <head> tag is one of the pieces of meta information mentioned above:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>

The first attribute here tells the browser what kind of information the second one provides. The second attribute specifies the content type—“text/html”—and the character set or encoding—“utf-8”. The other commonly used character set value is “iso-8859-1”, more familiarly known as Latin-1, which you may recognise from proofreading at DP.

With the encoding set to “utf-8”, many non-Latin-1 characters can be entered directly—e.g. the “curly quotes” used in our example, the oe ligature (œ), and Greek letters—making the source code easier to read. If you set the encoding to “iso-8859-1”, such characters must be entered as entities, e.g., &oelig; for œ.

N.B.: No matter which encoding you specify, there are a few reserved characters that must be entered as entities, or you will find that your HTML file does not validate. If you want an ampersand (&) character to display in your e-book, you must use &amp; in your code; otherwise, the browser will attempt to interpret what follows the & as a character entity. Similarly, use &lt; for the “less than” sign (<), as it may be misinterpreted as the beginning of an HTML tag, and &gt; for the “greater than” sign (>).

The <title> element

As its name suggests, the <title> element contains the text that serves as the title of your document and will appear in the title bar or tab of your browser window. Make sure that the book title and author are spelt and capitalised correctly:

<title>The Project Gutenberg eBook of Alice's Adventures in
  Wonderland, by Lewis Carroll</title>

You can also reverse the order of the information:

<title>Alice's Adventures in Wonderland,
  by Lewis Carroll&mdash;A Project Gutenberg eBook</title>

or

<title>Alice's Adventures in Wonderland,
  by Lewis Carroll--A Project Gutenberg eBook</title>

Some PPers prefer this second approach, as it means that the title shows up first in your browser’s title bar or tab. If you have several books open in your browser, the titles will then be easily distinguishable, rather than all beginning with “The Project Gutenberg eBook …”.

N.B.: You are responsible for the <title> element, but should not add the PG boilerplate or the “produced by” information to your file. That will be done by PG when your book is uploaded.

Book covers which only display in e-reader versions

Because many e-readers use covers to allow users to distinguish among e-books in their library, it is important to always include a cover image for your e-book. Sometimes, however, you might not want to include the cover in the HTML version, e.g., in the case of a custom cover.

The next line in our example

<link rel="coverpage" href="images/cover.jpg"/>

—enables you to include a cover image that will not display in the HTML version of your e-book, but will provide a cover for the epub and mobi (Kindle) versions. See the example in the Case Study on Images.

Cascading Style Sheets (CSS)

The next piece of our e-book, between the <style type="text/css"> and </style> tags, is the style sheet. This block contains a list of rules that define how the browser will display the document, such as the spacing between paragraphs and whether your text is justified or left-aligned. Default styles for elements are browser-dependent; to make sure that an element is styled predictably in your e-book, you need to specify the style in the style sheet.

Each CSS rule is made up of a selector and one or more declarations:

p
{
  margin-top: 0.75em;
  text-align: justify;
  text-indent: 1.5em;
}

For this rule, “p” is the selector, and the three lines between the curly braces are declarations. Each declaration is made up of a property/value pair, and must end with a semicolon. For the example above, “margin-top”, “text-align”, and “text-indent” are the properties, and “0.75em”, “justify”, and “1.5em” are the values. These three declarations will be applied to all <p> elements in your e-book.

If you later decide that you prefer more vertical spacing between paragraphs, you can change the value of “margin-top” from “0.75em” to “1.25em”, and that will increase the spacing between paragraphs throughout your book.

Classes

So, what happens if you have a paragraph that you don’t want styled that way? For example, most of the paragraphs in your book are indented, so you assign the <p> element a “text-indent” value of “1.5em”. However, what if the first paragraph of each chapter is not indented? In that situation, you can create a class in your CSS:

.no-indent
{
  text-indent: 0;
}

Note the period in front of the selector. Classes are user-defined and allow you to apply styles to selected elements, and often to different kinds of elements, where you do not wish that style to be applied to all elements of a given type. Thus, you can apply the “no-indent” class to specific paragraphs in your book, leaving the other <p> elements unaffected.

Ids

The third selector type is the id, which is also user-defined. Unlike a class, an id can be assigned to only one element in an HTML document, i.e., something that only appears once, like a Transcriber's Note at the start of your book:

#tnote
{
  border: 1px dotted black;
  max-width: 32em;
  margin: auto;
}

In the style sheet, an id selector is preceded by a hash mark (#).

N.B.: For a synopsis of the differences between classes and ids, see this page at W3Schools.

Nested CSS

Using CSS, you can also define styles for nested elements, classes or ids by nesting the selectors:

#tnote p
{
  margin: 1em;
}

Note the “p” after the “#tnote”. This rule will affect only those <p> elements that are nested inside the element assigned the “tnote” id. See how the CSS is applied in the example below in the section on “class” and “id” attributes.

The <body> element

The next section of our e-book, following the <head>, is the <body>. Everything that you want to display—in this case the entire content of your book—goes between the <body> and </body> tags:

<body>

<div id="tnote">
<p class="center">Transcriber's Note:</p>

<p>This is a sample file and bears only superficial resemblance to the
original it was taken from.</p>
</div>

<h2>CHAPTER I.<br/>
Down the Rabbit-Hole</h2>

<p class="no-indent"><span class="small-caps">Alice</span> was beginning to
get very tired of sitting by her sister on the bank, and of having nothing
to do: once or twice she had peeped into the book her sister was reading, 
but it had no pictures or conversations in it, “and what is the use of a 
book,” thought Alice, “without pictures or conversations?”</p>

<p>... Besides, <em>she's</em> she, and <em>I'm</em> I, and ...</p>

<p>... Lastly, she pictured to herself how this same little sister of hers
would, in the after-time, be herself a grown woman; and how she would keep,
through all her riper years, the simple and loving heart of her childhood:
and how she would gather about her other little children, and make
<em>their</em> eyes bright and eager with many a strange tale, perhaps
even with the dream of Wonderland of long-ago: and how she would feel with
all their simple sorrows, and find a pleasure in all their simple joys,
remembering her own child-life, and the happy summer days.</p>

<p class="center" style="margin-top: 1.5em;">THE END.</p>

</body>

Elements and nesting

You’ll notice that, inside the <body> element, every bit of text is contained within at least one block-level element<div>, <h2>, and <p>, in our example. All text and inline elements must be contained within a block-level element or your code will not validate. For instance, if you have a paragraph with an emphasised word or phrase in it, you must nest the <em> element within the <p> element:

<p>... Besides, <em>she's</em> she, and <em>I'm</em> I, and ...</p>

You can also nest self-closing tags within paired tags:

<h2>CHAPTER I.<br />
Down the Rabbit-Hole</h2>

Make sure that nested tags are properly nested, though, or it will cause problems. If you have a bold italic word, e.g., you can put the opening <i> and <b> tags in any order, but whichever element you open first, you must close last.

The “class” and “id” attributes

To use the classes or ids that you defined in the CSS above, you need to specify where to apply those styles:

<p class="no-indent"><span class="small-caps">Alice</span> was beginning to
get very tired of sitting by her sister on the bank, and of having nothing
to do: once or twice she had peeped into the book her sister was reading, 
but it had no pictures or conversations in it, “and what is the use of a 
book,” thought Alice, “without pictures or conversations?”</p>

The two classes here, “no-indent” and “small-caps”, are applied using the “class” attribute, which is written in the format class="classname".

The way of specifying a style using an id is the same, except that you use the “id” attribute—id="idname":

<div id="tnote">
<p class="center">Transcriber's Note:</p>

<p>This is a sample file and bears only superficial resemblance to the
original it was taken from.</p>
</div>

The <div> element here is assigned the id “tnote”, which is styled in the CSS with three properties—“border”, “max-width”, and “margin”. The Transcriber’s Note is stored within a <div> element because it consists of more than one paragraph, and the paragraphs should be grouped together. The styling defined for “tnote” applies only to the container <div>, not the nested <p> elements.

These nested paragraphs are styled by the nested CSS, which sets a margin on all sides of each paragraph nested within the “tnote” <div>. This extra spacing is needed to separate the paragraphs from the border surrounding the Transcriber’s Note, because the CSS for generic <p> elements only has a top margin.

The “style” attribute

In addition to styling specific elements by applying classes and ids, you can also specify an inline style for an element, directly, by using the “style” attribute:

<p class="center" style="margin-top: 1.5em;">THE END.</p>

This “style” attribute will override, for that one instance of the element, any global setting for the “margin-top” property. When there are places in your code where a style or combination of styles is used only once, it is your choice whether to use the “style” attribute or whether to create a class.

</html>

Don’t forget to close the door on your way out! After the </body> tag, you have just one final tag to close:

</html>

Of course, your book will likely have other items not discussed above, such as illustrations, poetry, or tables. Check out the other Introductory Topics and the Case Studies for additional tips and tricks.

Also, remember to save early and often, and make separate backups periodically, especially before making major changes to your code. View your HTML file in multiple browsers, because different browsers behave differently. Use the W3C Markup Validation Service to check your (X)HTML code, and the W3C CSS Validation Service to check your CSS; PG does not accept HTML files that don’t validate.

The important thing is, don't be scared of trying things. If you make changes to your file and it breaks, go back to the most recent backup. Ask questions. And most of all, have fun!

Happy PPing!

To comment or request edits to this page, please contact lhamilton.

Return to DP Official Documentation Menu