Accessible HTML eBooks

From DPWiki
Jump to navigation Jump to search

This large document uses the W3C's Web Content Accessibility Guidelines as a framework for discussion of how we can create more accessible eBooks. For more detailed and specific help on various features, try the Accessibility Recipes.

Introduction

Project Gutenberg ought to be a flagship of accessibility. Often people ask why we do all this proofreading, when surely we could preserve far more books if we just put up page images. Accessibility is one of the best answers. Even a simple plain text file is more accessible than an image, and a little bit of care can ensure even broader accessibility. Simple adjustments in postprocessing could make a world of difference to people who are color-blind, who use screen readers to read our eBooks, are unable to use a mouse, etc.

There are a lot of useful guidelines for accessibility on the web, including the W3C Web Content Accessibility Guidelines (WCAG), some of which are referenced here. But remember that we are dealing with a special case, and that eBooks are different in some ways than web pages, which are the focus of much of the online advice available. For example, sometimes these sites recommend changing the text of a page to support accessibility, which is not an option for us. And while web page guidelines often recommend the latest technology, we need to create texts that may be unchanged for decades and support the widest possible array of reading devices.

In the sections below, where applicable, a box shows the level of conformance each success criterion is assigned to, according to WCAG 2.1: A indicates the criterion is assigned to the lowest level of conformance; AA for the middle level, and AAA for criteria assigned to the highest level of conformance. The conformance level is followed by the success criterion number. More detail is given here on how the levels of conformance were determined.

General Principles

When creating accessible etexts of public domain sources, there are a few basic principles we must follow:

  • Consider accessibility from the beginning. Don't add accessibility as an afterthought; ask yourself at every stage if a decision will affect accessibility. This is especially important if your project relies on technologies that can break accessibility: MathML, LaTeX, table column alignment, etc.
  • Don't change the original text. The goal of Project Gutenberg and Distributed Proofreaders is to preserve texts, not to improve them. This means we can't follow certain accessibility advice, such as "front-loading" (re-drafting a paragraph with the conclusion first so that people using a screen reader can choose to skip it). Also, since we're not the original authors, it's not always obvious whether to tag italics in a text as <i> or <em>--we can expect careful editing from our volunteers but not mind reading.
  • Follow PG guidelines. Similarly, PG guidelines make some web accessibility advice impractical. For example: every PG text must include all CSS and HTML in a single file, making recommendations for separate "longdesc" file links impractical.
  • Add new content sparingly and thoughtfully. Accessibility sometimes requires adding new content like alt tags and table summaries. Keep this new content objective and avoid editorializing.
  • Document your work. If you do add original content, say so in your Transcriber's Notes and clearly state that you are placing any original content in the public domain.
  • Don't Cut Corners. If you're thinking of solving a visual problem with a creative use of an HTML element like a table, list, or header, don't. Find a solution that follows good HTML practice and document semantics. Many issues arise when people use HTML elements to change how a text looks, without thinking about their semantic meaning.

Specific Elements and Items

One of the most important things you can do to support accessibility is to follow the existing Project Gutenberg and Distributed Proofreaders guidelines. Use valid markup and CSS and avoid things like scripting and frames, which can break accessibility. Remember that while we use CSS to style our books, all DP html files should be fully legible without any CSS. Following this rule will also help with accessibility in most cases.

Some items, like images and tables, also require specific special treatment to be accessible to everyone. The list below covers some items where you just need to follow the rules, and some where you need to take extra steps.

Abbreviations

Semantically tagging abbreviations with the <abbr> tag helps screen readers properly identify and pronounce abbreviations, initialisms, and other shortened versions of words. It is good practice to mark up abbreviations in your text with the <abbr> tag. Although it would be preferable to include the expansion as part of the text, we cannot do that for our books, so using the <abbr> tag is the next best option. Examples here. AAA 3.1.4

Abbreviation Titles

The "title" element in an <abbr> tag is used to give the full version of an abbreviation or acronym. This can be useful in some cases, but is not recommended for every abbreviation in a text. It is only necessary to add a title element to the first occurrence of an abbreviation unless the same abbreviation is used for different purposes, in which case the correct title should be added to each. For example, "St." could used for "Saint" and in the same book for "Street".

If your text mentions an acronym with a commonly understood meaning, like "laser," or "UK," or "Mrs.", it should be marked with an <abbr> tag but adding a "title" element is not necessary; it just creates audio or visual clutter.

If there is an acronym that was once well understood but is not likely to be familiar to a modern reader, you may consider adding a title element. In a visual browser, this will normally show as a dotted line, allowing the reader to hover over the text to see the expanded version. Screen readers may also make this available to the end user, although support is not universal.

For example, the Aerated Bread Company was a popular chain of teashops in London in the late 19th and early 20th centuries, and their locations were commonly called "A.B.C. stores." The phrase here was coded as:

<abbr title="Aerated Bread Company">A.B.C.</abbr> stores.

In a visual browser, hovering over the "A.B.C." in the paragraph above should show you the expanded version.

Even where an abbreviation is spelled out in the text on its first usage, the <abbr> tag may help some readers; because navigation is easier in an ebook, readers are more likely to skip to the section of a book that is relevant to them. This is especially true of those using screen readers, as it's not possible to "skim" in the same way a visual reader would.

Types of Abbreviations

There are multiple types of abbreviations, including acronyms like "laser" which are pronounced as a word, and initialisms like "E.U." in which each letter is pronounced separately.

In previous versions of HTML there were a separate <abbr> and <acronym> tags. However, <acronym> has now been deprecated and it is now recommended to use <abbr> for all abbreviations.

An alternate way of showing which type of abbreviation is to use a standard vocabulary like the Structural Semantics Vocabulary. This uses the epub:type attribute to identify an <abbr> as either "z3998:acronym", "z3998:initialism", or "z3998:truncation". While this is a standard it may not be universally supported and is not required.

Color

Accessibility for users with a color vision deficiency (colorblindness) is important in general web design, but less so for our work at DP, because most of our output is plain text, and where color images do exist they are generally pre-existing images that we can't change. However, there are a few areas where you may run into issues:

  • Cover images
  • Transcriber's notes
  • Links

General Principles

  • Make sure no information is conveyed only through color. A 1.4.1
  • Use sufficient contrast as well as color. AA 1.4.3

Cover Images

When creating new cover images for ebooks, ensure that there is enough contrast between the background and any text that the text is easily visible, even if you change the image to grayscale. Consider using a high contrast block for the title text (e.g., black text on a white rectangle) rather than placing the text directly on a colored background. AA 1.4.3

Transcriber's Notes

The other place color sometimes appears in our ebooks is in the Transcriber's Note. It is okay to use a light colored background to set off your transcriber's notes, but remember that a colored background may be less visible to a person with a color vision deficiency. Consider also using a border, changing to a sans serif font, or setting the transcriber's note off from the rest of the book in another way as well. A 1.4.1

Links

Avoid changing the default colors of links, or removing underlines from links through CSS. A user with a color vision deficiency may rely on a customized browser or ereader to identify links and distinguish between followed and unfollowed. Too much tinkering with CSS can cause problems for these users.

Other Issues

  • Do we mind if people can't see our visible pagenums, in the interests of making them unobtrusive for those with good vision?

Emphasis

  • The recommendation here is to use <em> and <strong> instead of <i> and <b>. Or, where these occur in section headings or around foreign words or whatever, to mark up those semantically and style them using CSS.
    • One could argue that we are trying to represent a visual medium, namely a dead-tree book, and that we don't know what the author's italics meant.
    • Or that moving the italics and bold to the CSS is bad for non-CSS users.
    • Speech browsers use a different voice inflection for emphasised and strong text. (Perhaps not for italic and bold?)
    • Using <em> and <strong>, if they are styled in the CSS to be italic and bold, can't hurt.
  • Other forms of emphasis can be achieved using CSS to style the tags differently. For example, "strong" underlining:
strong.u { text-decoration:underline; font-weight:normal; }
  • Don't fake gesperrt text by sticking in &nbsp;s, as this will cause screen readers to spell out the word letter by letter. Try this CSS instead:
em.gesperrt { letter-spacing:0.3em; font-style:normal; }

Headings

HTML headings should not be used for anything except structural headings. If you follow the guidelines in the Post-Processing FAQ and the Headings section of the Easy Epubs documentation, it will produce better ePub conversion and more accessible output. Note that headings should be hierarchical (e.g. h3 inside h2), and levels should not be skipped (e.g. h4 directly inside h2). AAA 2.4.10

Users who can't visually browse through a text rely on headings to navigate through it. Screen readers will read and list these headings for them. DO make sure to use heading tags for every heading, rather than just representing it with font size changes, and DON'T use headings for anything else. Don't, for example, use heading tags to represent the various sizes of text on a title page.

Images

Text equivalents should be provided for every non-text element. A 1.1.1

  • "Equivalent" means it should serve the same purpose as the image, rather than describing it. If you were reading over the phone, what would you say?
  • Ideally use normal prose, and include punctuation because this gives voice pauses. Beware of homonyms—i or eye?
  • Attributes are allowed to contain entities, so there is no need to strip accents from letters etc.

Alt text

PG has added alt text to Winnie the Pooh, ebook #67098, with a particular eye for accessibility. If you are considering what alt text might be appropriate for your book, it is recommended that you look at the examples in the source of the HTML version, and consider how the alt text enhances the experience for a reader who is unable to see the illustrations. Consider the purpose of the image. For example, an image may advance the narrative. The alt text should attempt to accomplish the same goal.

WebAIM also has examples, some of which are relevant to our books.


  • Although every image must have an alt attribute, sometimes it is appropriate for it to be empty, i.e., alt="". Assistive devices may present the alt text, so, as with improper headings and lists, we don't want to waste the user's time with unnecessary information. However, we do want to give the user as close an experience as possible to visually seeing the image. Some examples of where an empty alt is appropriate:
    • Images whose caption describes the image.
      • Don't re-use the caption in the alt text: nobody wants to hear it twice. (Unfortunately old versions of Guiguts 1 used to encourage this bad habit.)
      • Many captions comment on the image and don't describe it. In this case, alt-text is needed.
    • Images where an adequate description is in the body text, adjacent to the image. As above, commentary on the image in the body text may not describe the image well.
    • Purely decorative or spacer images.
      • All accessibility guidelines are in agreement on this.
      • Add data-role="presentation" to indicate that the absence of alt text is intentional.
      • You may feel that despite being decorative rather than conveying information, an image is an essential aspect of the book we are preserving and so a description is warranted—are readers interested in the content of the book, or in the book itself and how the original looked?
  • "Do no harm." Using something like alt="[image]" or alt="figure 2" is worse than alt="".

Where alt text is appropriate, bear in mind:

  • Keep the alt text relevant, so that someone using a screen reader isn't subjected to unnecessary information. Historically, it was recommended to stay under 150 characters if possible, but that is no longer required. If the the image conveys important information, use enough text to convey that information.
  • Assistive devices usually indicate to the reader that the alt text is associated with an image, so it's not necessary to say that.
  • Use the words, language and spelling of the text for the alt text.
  • Special situations:
    • Images that are links.
      • The alt needs to say where the link goes.
      • Generally don't say "Link to...."; the browser tells the user that.
      • For us it usually only links to a larger image. Perhaps alt="The book's cover, linked to larger image." is ok?
    • Decorative horizontal rules.
      • Use: alt="" data-role="presentation".
      • Don't insert gratuitous <hr> elements. Many assistive technologies announce them.
    • Sliced images.
      • Put the alt only in one slice (no repeating, no "sharing out" the alt text amongst the images) and empty alt in the others.
      • If a sliced image is linked, make it just one hyperlink around the whole lot.

Images of Text

Avoid this if possible, as the image will be useless to those who can't see it. A: 1.1.1

  • Text that flows around images, or has a fancy image border, can be done without resorting to an image of the text: ask for help in the forums or browse the PP_examples_on_PG#Illustrations.

But if you can't help it (such as a complicated equation or lyrics in music notation), try one of the following:

  • Use good alt text (LaTeX code for an equation?).
    • But be aware that if the image is small and you specify its height and width (which is usually good practice), then the alt may be truncated on the screen.
    • What if you need to explain the notation you will use in the alt? See JKorpela's advice.
  • Reproduce the text below the image. (You may want to add a Transcriber's Note explaining that you've done this.)
  • Link to a much bigger, high-contrast version.
  • If the image plays the role of, say, a level-3 heading, you should still enclose it in <h3> tags.

Other Image issues

  • Background images are ignored by non-visual media (and often by browser settings, or in epub versions, or for print) so don't rely them for important content.
    • This affects some methods of handling decorative drop-caps, horizontal rules, and title pages with borders.
    • Linking to the image separately, and providing its text equivalent in the title attribute, helps somewhat.
  • Symbols also need a text equivalent.
    • For example, does a screen reader recognize the "Planet Neptune" symbol ? Using the title attribute will help sighted readers also.
    • The vast majority of characters and symbols can be expressed using Unicode. Combining characters enable accents and other marks to be added above/below letters. Images should not be used in place of characters or symbols.

Language

Screen readers need to know the language in order to pronounce the words correctly.

  • Identify the document's base language. A: 3.1.1
    • Just stick lang="en" (or fr or whatever) into your <html> tag. If you're using XHTML rather than HTML, use both lang="en" and xml:lang="en"
    • Problem with the above: PG's boilerplate is always in English. Alternative: wrap the entire text in one big <div> containing nothing but a language tag. The boilerplate will automatically go outside this <div>.
  • Changes in language. AA: 3.1.2
    • Just put the lang onto any other tag that surrounds the passage, sentence or word that's in a different language, such as a <blockquote>, <q> or <span>. See this example.

Links and Navigation

  • Clearly identify the target of each link. A 2.4.4 Many browsers and most assistive technologies allow users to call up a list of page links, so descriptive links are a must.
    • By putting the link around a suggestive bit of text.
    • And/or using the title attribute. (Currently, screenreader users and sighted mouse users can access this text, but sighted keyboard-only users cannot.)
    • Someone may have tabbed to the link, and not heard what went before it.
    • Sighted or not, non-mouse-users hate the phrase "click here," as do people who skim web pages for the information they want.
  • Make sure links that look/sound the same all go to the same place. 2: 13.4
    • Again, assistive technologies may read the list of all links on a page.
    • Should we really have [2] linking to a footnote, and another [2] to return?
  • Allow skipping over things that a user may not want to listen through.
    • Such as ASCII art ... or the PG boilerplate? ;-)
    • Tables of Contents are very tedious: on many sites you see "skip navigation"; this is our equivalent
    • A Alphabetic Jump Table for the Index is a good idea
  • Make it easy to move around the document.
    • Discreet links back to the ToC?
  • Provide logical tab order for links, and access keys for important ones. A 2.4.3
    • Tab order is probably ok, unless your book is unusual.
  • Consider making it more obvious which link has keyboard focus, so that you can see the cursor when tabbing through
    • Default is usually like this
    • This is achieved by a:focus, a:active { outline:yellow solid 2px; background-color:yellow; }. (Outline rather than border, because borders take up space and we don't want the text to reflow while tabbing from link to link.)

Lists

DON'T use list markup merely to get indentation or spacing effects. A screen reader will give the number of items in the list before reading the items, which will be confusing if it's anything but a semantic list. The definition list (<dl>) seems to tempt this the most.

DO mark up anything that is semantically a list as an HTML list, so that a screen reader will recognize it as such.

    • Don't instead use a bunch of <p>s and <br />s, or fake numbering/bulleting by typing in the characters.
    • The CSS display:inline property can be used for lists that look like this:
ul.inline li { display:inline; list-style-type:none; }
Introduction; Process of repair; Healing by primary union; Granulation tissue; Cicatricial tissue; Modifications of process of repair; Repair in individual tissues.
 <<ul class="inline">
 <li>Introduction;</li>
 <li>Process of repair;</li>
 <li>Healing by primary union;</li>
 <li>Granulation tissue;</li>
 <li>Cicatricial tissue;</li>
 <li>Modifications of process of repair;</li>
 <li>Repair in individual tissues.</li>
 </ul>

Metadata

Fill in the metadata on author, etc.

Now that our books submitted to PG are processed by ebookmaker to provide the HTML and Epub versions visible to the reader, it does not seem necessary for this to be added during PPing.

The information below may be outdated.

I (Jeroen Hellingman) typically include Dublin Core meta data in meta tags.

 <link rel="schema.DC" href="http://dublincore.org/documents/1998/09/dces/">
 <meta name="author" content="...">
 <meta name="DC.Creator" content="...">
 <meta name="DC.Title" content="...">
 <meta name="DC.Date" content="...">
 <meta name="DC.Language" content="...">

The author and creator fields should contain the full name of the book's author, the title contains the title of the book, the date should be the date you upload the file, and the language should be the same two- or three-letter code you used in the lang attribute (e.g. en for English, fr for French, enm for Middle English).

Page Regions

  • Surround the whole book with <main>...</main> tags. (Guiguts 2 will add this automatically)
  • Surround the Table of Contents, List of Illustrations and Index with <nav>...</nav> tags, to indicate they contain links for navigation through the book. (Guiguts 2 will add this to the automatically generated ToC, and to a marked up index, but it is your responsibility to add it to any other navigation sections)
  • Ebookmaker will add <header> & <footer> tags around the PG boilerplate.
  • Use <section>...</section> to surround sections, such as a block of advertisements, or the front matter

Quotes and blockquotes.

  • Don't use <blockquote> for things that are not quotations.
    • In DP we use our /# ... #/ markup for anything that looks "special". But in PP we should probably change to <div> unless the contents really are a quotation from someone (or something) else.
    • But unfortunately, non-CSS users will lose out if we do this. See below.
  • The <q> element exists for short inline quotations.
    • Beware! It generates its own quote marks.

Tables

HTML tables should be used only to represent tabular data, and not for visual layout.

Tables for tabular data

Screen readers handle tables in a special way. It's important to ensure that someone who can't see the table layout will still be able to understand the data. A 1.3.1 & A 1.3.2

Refer to the these examples of table markup techniques.

  • Use table column and/or row header code, <th>, so that a screen reader recognizes the headers as such.
  • Use <caption> as described here, or if there isn't one, put one in the title attribute of the <table> tag.
  • If the table is complex (e.g. different levels of heading, sub-columns...):
    • Use <thead>, <tbody>, <tfoot>. W3Schools.
    • <col> and <colgroup>. [1]
    • scope and/or headers and/or axis. [2] Basically:
      • scope says what cells a header cell refers to, e.g. <th scope="col">.
      • headers says which header cells apply to a data cell, where you must have given each header cell an id.
  • If table headers are long, use the abbr attribute. AAA 3.1.4

The number of calories per 100g of food

Speech browsers may read the header before every cell it applies to(!) but may switch to the abbreviated version after the first row.
  • Don't allow borders only to convey something.
    • For example, if the total of a column of figures is shown by a horizontal border separating the column from its total, consider using the title attribute on that cell.

Tables for layout

Avoid if possible! A 1.3.1

  • Sometimes we just can't help but use tables.
    • There will always be unusual things (family trees, parallel translations, diagrams, etc.) where we need to use tables.
    • In this case, the recommendation is don't use <th> etc, unless the cell really does refer to a whole row or column.
  • The Table of Contents/List of Illustrations debate: some people feel these are tables, others that they are lists. Lists with floated-right page numbers do not work so well in the wider range of formats we now produce, so the majority of PPers now use tables for these.

aria-hidden

Assistive technologies often announce elements such as horizontal rules, <hr>. These can be really annoying to a blind person. You can hide them from screen readers by adding an aria-hidden="true" attribute. (Guiguts 2 uses this attribute to hide horizontal rules added between chapters by HTML auto-generation code)

Miscellaneous

  • Anything for which you need the <pre> tag is likely to be an accessibility disaster.
  • If you find yourself using <span> a lot, think about whether one of <dfn>, <cite>, <ins> or <del> might be what you need. [3]
  • Parts of the title page that are all one phrase, like "By Professor V. Famous, Ab.C., D.E., Author of 'A Really Interesting Book' and 'A Rather Expensive Doorstop'", shouldn't be broken up into separate <p> or <h> elements.

Without CSS

The document should still be usable without the CSS. This can conflict with "proper use of markup".

  • The <blockquote> tag should be used for quotations, not for indentation ... so how do I distinguish other special passages without CSS?
  • Remember that CSS classes you create don't have to be used only with <span>s; styling an element that actually does something will ensure non-CSS users see a difference, even if not quite what's wanted. For example, if the book emphasised the words "Lord" and "God" using small-caps, you might use <em style="font-style:normal; font-variant:small-caps"> so that CSS browsers display what you want, while those without display the browser's default emphasis (usually italics).

Tools and References

Things to try, using browser/other tools

  • User stylesheet:
    • Choosing user stylesheet when you haven't created one (i.e. an empty stylesheet) is an easy way to see the document with all CSS removed, not just the header.
  • Outline view:
    • Some browsers and PPing tools can show you the structure of a document's headings.
  • Turn off images.
  • Drag your window so that it is narrow, to see how your book would look on a mobile device
  • Use ebookmaker to create the Epub version, and check that it looks OK
  • Enlarge your font a lot.
  • Speech:
    • Your computer may have some built-in speech capabilities. (But a proper screen reader is —I hope!—likely to do a better job.)
    • The most widely-used screen reader seems to be JAWS, which is windows-only and very expensive.
  • Unplug your mouse. Are you wishing there was a key for ___ ?
  • Switch to white-on-black (or whatever high-contrast, no-colour option is provided). Can you tell which are links / transnotes / etc?
  • Turn on screen zoom. Are there things you'll never see unless you know where to look?

Automatic accessibility checkers

These should be used with common sense. Not all accessibility issues can be measured by machine, and many tools err on the side of producing copious warnings which may turn out to be false alarms. The point is not to ensure your page "passes" these tests, but to consider the issues raised by them.

  • The WAVE Accessibility Tool is recommended by PG. It points out accessibility issues and displays your page with icons showing various tags and possible problems.
    • WAVE homepage can be given the URL of an HTML file (e.g. a cached HTML output file generated by ebookmaker) to generate a report.
    • WAVE tool is available as a browser extension for Chrome, Firefox & Edge. Once an HTML file is being viewed, a click of the extension's "W" button generates the report.
    • Guiguts 2 has a "WAVE Report" button in the Ebookmaker API dialog, which displays a report for the cached HTML file generated by ebookmaker.

Other checkers

  • Colour:
    • Vischeck lets you view an image as it would appear to a colourblind person.
    • Colour Contrast Analyser tells you if two colours are sufficiently contrasting or not.

References

References about images

References about tables