PPTools/EBookMaker

From DPWiki
Jump to navigation Jump to search

Note: Although much information about ebookmaker is gathered in one place here, this is not an official documentation page, so may be out of date. Refer to DP_Official_Documentation:PP_and_PPV/DP_HTML_Best_Practices/Case_Studies/Media_Types for official information.

Introduction

Ebookmaker turns an HTML file and its associated images folder into two .mobi files (one of which is in the older mobi format, and one in the newer kf8 format) and two .epub files (one EPUB2 and one EPUB3 format). The mobi files are intended for use on Amazon's Kindle reading devices and apps; the epub files are used on most other e-book readers.

It is available under the HTML menu for Guiguts. The Ebookmaker version originally downloaded when installing Guiguts does not automatically update so will become stale. It is convenient for quick checks.

The official converter used by Project Gutenberg is here: https://ebookmaker.pglaf.org/ and should be used for final checks.

Making things look different in e-books and HTML

Some features that look good in HTML look bad in e-books. When we submit to PG, we only send over the HTML and not the e-books. PG generates the e-books so we don't have the luxury/burden of tuning what is submitted to PG.

Formatting for e-books can be modified in these ways

  • Adding an .x-ebookmaker alternate definition in the CSS section of the HTML
  • Add an x-ebookmaker-drop to the class string
  • Add an x-ebookmaker-important to the class string
  • Add rel="nofollow" to an <a href=...> element
  • Modifying the @media handheld section of the CSS section of the HTML (deprecated)

Official information that is relevant here DP_Official_Documentation:PP_and_PPV/DP_HTML_Best_Practices/Case_Studies/Media_Types

How @media works in relation to ebookmaker

The @media rule is used in media queries to apply different styles for different media types/devices. Media queries can be used to check many things, such as width and height of the device or orientation (is the tablet/phone in landscape or portrait mode?). Media queries are a popular technique for delivering a tailored style sheet (responsive web design) to desktops, laptops, tablets, and mobile phones. You can also use media queries to specify that certain styles are only for printed documents or for screen readers (mediatype: print, screen, or speech).

A mediatype that was supported in CSS2.1 was handheld, with the original intention that it would allow different CSS to be used on handheld devices. Due to a lack of support from handheld device makers and blurring of the lines between handheld and not, handheld and most other media types were deprecated in CSS3, leaving just all, screen, print and speech. It is expected that all media types will be deprecated and replaced with finer-grained media features such as width or resolution.

During the period when it was expected that handheld would signify a handheld device but many such devices did not self-identify as handheld, it made sense for the precursor to ebookmaker to replace "@media handheld" with "@media all" as it created the epub/Kindle versions. This ensured that even if a device did not consider itself handheld, the epub/Kindle version would use the CSS designated as being for handheld devices.

So, consider the following example (deliberately chosen to show the concept without being diverted into whether it is useful):

.bright {color:blue;}
@media handheld { .bright {color:red;} }

In a browser on a computer (which doesn't consider itself handheld), any element chosen by the .bright selector will have blue text, because it does not read the second rule-set as it is inside '@media handheld'.

However, in a file that ebookmaker has created (typically epub/Kindle), the code above will have been processed to look like this:

.bright {color:blue;}
@media all{ .bright {color:red;} }

On any device (whether it considers itself handheld or not), any element chosen by the .bright selector will have red text, because the second rule-set is read since it is inside '@media all'. This second rule-set will override the first because that's how CSS works - a later rule-set with the same selector will override any earlier ones.

How x-ebookmaker works

Selectors in CSS can be quite complex, although we typically just use simple classes. One slightly more advanced selector is to specify two different selectors (such as class names) with a space between them, e.g. '.name1 .name2' selects all elements with class name2 that is a descendant of an element with class name1.

To see how ebookmaker uses this feature of selectors, first note that all of the elements that make up our books are descendants of a <body> element. When ebookmaker creates an epub/Kindle file, it adds the class x-ebookmaker to the <body> element. So, if we want some CSS to only take effect in a file created by ebookmaker, we can use a class "myclass" on an HTML element and then use ".x-ebookmaker .myclass" as a selector in the CSS. The CSS will only apply to elements with class myclass that are descendants of an element with class x-ebookmaker, which of course will normally only be true in files created by ebookmaker. Therefore, the CSS thus defined will only apply to epub/Kindle files, not to the normal HTML file viewed in a browser.

So, using the above example, we write:

.bright {color:blue;}
.x-ebookmaker .bright {color:red;}
...
<body>
<p class="bright">Some text</p>
</body>

In the HTML file, a paragraph with the bright class will have blue text, because it is not a descendant of an element with the x-ebookmaker class, so the selector in the second rule-set does not select the paragraph.

However, in a file that ebookmaker has created (typically epub/Kindle), the code above will have been processed to look like this:

.bright {color:blue;}
.x-ebookmaker .bright {color:red;}
...
<body class="x-ebookmaker">
<p class="bright">Some text</p>
</body>

Now, the bright paragraph is a descendant of the body element with the x-ebookmaker class, so the second rule-set will select the paragraph and the text will be red.

Rarely needed advanced use only: It is also possible to specify either .x-ebookmaker-2 or .x-ebookmaker-3 in a selector, which means the styling will only affect ebookmaker's EPUB2 or EPUB3 output respectively.

How x-ebookmaker-drop works

The special class name x-ebookmaker-drop can in theory be added to any element that you want to be dropped from the epub/Kindle versions. However, it should not be used on any element where there may be an internal link to the element, such as an image that is linked from your List of Illustrations. Doing this may cause errors when kindlegen is used by ebookmaker to create the Kindle version. This rule also applies to headings such as h2 and h3, because they be referred to in the Table of Contents that ebookmaker creates - not the book's actual ToC that you can see in a browser in the main body of the text, but the separate generated epub contents (or outline) that users can often easily access on an ebook device in order to be able to quickly jump to any chapter or section of the book.

This class does not allow customisation of CSS in the same way as the previous methods. Its only purpose is to efficiently hide elements in the versions created by ebookmaker. It works by ebookmaker spotting that the element has the class x-ebookmaker-drop and consequently omitting that element from its output.

How x-ebookmaker-important works

The special class name x-ebookmaker-important can be added to a specific elements to tell ebookmaker not to modify or remove them.

An example of this is when an <h2> heading begins with "By ". By default, ebookmaker suspects this may be a misuse of <h2> to display the author's name in the front matter, e.g. "By Charles Dickens". It therefore does not include the <h2> heading in the epub ToC. If it is in fact a valid heading, e.g. a catalogue of books "By the same author", and you want the heading to appear in the epub ToC, add the class x-ebookmaker-important to the <h2> element.

A second use of x-ebookmaker-important is to tell ebookmaker to retain an illustration even in a "no-images" epub version.

How rel="nofollow" works

The special attribute rel="nofollow" can be added to an <a> element which links to an image, either directly or via a "Larger image" label, when the large image is required to be dropped from epub/Kindle versions. A typical use is where a smaller image of a map, for example, links to a larger, more detailed one in the HTML version, but the larger one is not wanted in the epub/Kindle version for some reason. Some PPers do not use this method, preferring to include the large image (but viewed at a smaller scale) within the main body of the text, and allowing the user to utilise the browser or epub reader's features to zoom in on the image.

In the following example, the text "Link to larger image" is dropped due to the x-ebookmaker-drop class, and the larger image, test-large.jpg, is dropped because the only link to it is flagged with rel="nofollow":

<div class="x-ebookmaker-drop">
<a rel="nofollow" href="images/test-large.jpg" width="800" height="1200" alt="Test">Link to larger image</a>
</div>

Example: Suppressing chapter division HRs in e-books

The problem is an HR (Horizontal Rule) used to separate chapters in HTML will cause some chapters of the e-book to be followed with a page with only an HR.

Example of how to suppress chapter HRs using the x-ebookmaker method

In the CSS section the selector hr.chap selects HR elements that have the class chap. An override for handheld uses a descendant selector to only select chap HRs that are descendants for x-ebookmaker elements. In the HTML file viewed in a browser, neither the <body> nor any other element would have the class x-ebookmaker, so nothing is a descendant of an x-ebookmaker element, and only the first rule-set applies to chap HRs. However, since ebookmaker will have added the x-ebookmaker class to the <body> in epub and Kindle files, then the second rule-set will hide the chapter HR when viewing the epub/Kindle file on any device.

hr.chap {width: 65%; margin-left: 17.5%; margin-right: 17.5%;}
.x-ebookmaker hr.chap { display: none; visibility: hidden; }

In the main body of the HTML the HR is injected like this.

<hr class="chap" />

Example of how to suppress chapter HRs using the x-ebookmaker-drop method

To use the specific class x-ebookmaker-drop, you leave the CSS section of your file as you want it for the HTML file in browsers:

hr.chap {width: 65%; margin-left: 17.5%; margin-right: 17.5%;}

In the main body of the HTML the HR is injected like this (note the added class name).

<hr class="chap x-ebookmaker-drop" />

Example of how to suppress chapter HRs using the @media handheld method

Using @media handheld is deprecated and will result in warnings when validating CSS. It should be avoided.

The following information is extracted from a forum discussion started here

In the CSS section the selector hr.chap selects HR elements that have the class chap. An override for handheld uses the same selector and hides the HR, but would normally only do this if a device considers itself handheld. This means that on a browser, the chapter HR will be shown. However, since ebookmaker will have changed the word 'handheld' to 'all' in epub and Kindle files, then the second rule-set will hide the chapter HR when viewing the epub/Kindle file on any device.

hr.chap {width: 65%; margin-left: 17.5%; margin-right: 17.5%;}

@media handheld
{
  hr.chap    { display: none; visibility: hidden; }
}

In the main body of the HTML the HR is injected like this.

<hr class="chap" />

Info on handling Drop Caps

DropcapsForEpub - this solution is unsatisfactory since it removes letters from words, causing them to be mis-read by screen reader software and not to be found by searches.

There is some official documentation on Drop Caps which matches the Best Practices document.

Info on handling Small Caps

SmallCapsForEpub

What ebookmaker does when converting your HTML file to mobile formats

The contents of this section were taken from part of the HTML Best Practices document on Mobile Formats and edited a little. See the original page for more links and examples of use.

It splits your file

Ebookmaker splits your HTML file into several smaller files, since many e-readers cannot cope with large files. Each of those files will start on a new page, which means that if ebookmaker decides upon a split in an unfortunate place, you might have a page break where you don’t want it.

How does ebookmaker decide where to split the file? It has a size limit and will split somewhere before it reaches that limit. It prefers to split before <h1> or <div> elements, then <h2>, <h3> and <p>.

You cannot prevent ebookmaker from splitting your file, but you can influence the positions of the splits. You can wrap parts that belong together in a <div>. For example, ebookmaker might decide to split your file between a chapter heading and the motto that comes right after it. If you wrap the two items in a <div>, they should be safe from being separated.

You can also force ebookmaker to split a file in a specific place: Just set a class of “chapter” or “section” on a <div> and ebookmaker will always split before that <div>.

It adds some CSS

Also important to note is that ebookmaker adds some CSS to your e-book—which is applied after your own CSS, and therefore might override your choices in some cases. If you notice any unexpected formatting when testing your “mobile” versions, make sure you check whether this is due to the CSS that was added by ebookmaker, rather than your own.

The relevant parts of the added CSS—slightly reformatted to make them easier to read—are (as of ebookmaker version 0.9.1, August 2020):

body
{
  color: black;
  background-color: white;
  margin: 0.5em;
  width: auto;
  border: 0;
  padding: 0;
}

div, p, pre, h1, h2, h3, h4, h5, h6
{
  margin-left: 0;
  margin-right: 0;
}

h2
{
  page-break-before: always;
  padding-top: 1em;
}
div.figcenter span.caption
{
  display: block
}

It converts elements with special CSS classes

Usually, you are free to choose any name for a CSS class. There are a few class names, however, which get special treatment by ebookmaker. Although this is often helpful—when you and ebookmaker agree on the meaning of the name—it can potentially wreak havoc on the resulting epub and mobi files. Therefore, it helps to know the special class names and what ebookmaker does to them—so that you will only use those classes where it is appropriate.

The class names with special meanings are:

  • pagenum, pageno, page, pb, folionum, foliono: Any element that uses one of these classes will be treated as a page number by ebookmaker. The entire element will be replaced by an anchor (<a>) that does not contain any text. While this is a reasonable way to treat real page numbers (preserving the ability to link to them while removing the displayed numbers), it will delete content from your book if used for something other than page numbers. Ebookmaker uses the class x-ebookmaker-pageno internally, but this should not be used explicitly by PPers.
  • versenum, verseno: These are treated nearly the same as page numbers; their content will be stripped.
  • chapter, section: When used on a <div>, these classes will lead to a page break in the “mobile” formats due to the file being split at these points.
  • x-ebookmaker-drop: Using this class on an element will cause the element to be omitted in the mobile formats as described here.
  • x-ebookmaker: Do not use this class on elements in your HTML. You may use it as part of a descendant selector in your CSS to modify the CSS used in mobile formats. More details here.
  • x-ebookmaker-important: This class on an image element tells ebookmaker not to remove the image, even in no-images builds. It also tells ebookmaker not to omit a chapter heading from the ePub ToC, even if the heading begins with the word "By" which happens for historical reasons. More details here.

It removes various things from some formats

In addition to inserting some CSS and converting special elements, ebookmaker also removes some things.

Firstly, it deletes the following properties from the CSS in the older EPUB2 and mobi formats. Unless specified, EPUB3 and kf8 formats retain these CSS properties:

  • “float” (and, on the same images, also “width” and “height”): This un-floats all floated elements, e.g. images with text flowing around them and, in particular, illustrated drop-caps. Note that if “float” is used inside @media handheld or using the x-ebookmaker class as described above it is not removed. This gives a mechanism for restoring illustrated drop-caps to the mobile formats.
  • (also removed in EPUB3/kf8 formats) “position”, “left”, “right”, “top” and “bottom”: ebookmaker removes any absolute or relative positioning. If you have moved elements to the margins (like line numbers in poetry) using “position”, they might end up in unexpected places in the “mobile” versions.
  • “background-image” and all related properties: Background images set using CSS will not be displayed in the “mobile” versions.
  • Finally, any properties whose values end in “px”: This often applies to “width” and “height”, but can also affect other properties, e.g., borders. Note that ebookmaker does not remove width and height attributes from <img/> elements, just pixel sizes specified in the CSS.

Secondly, ebookmaker removes displayed cover images. This means that if you have included your cover image in the HTML, it will be removed from its original place—since it is already used as the cover image, and thus displayed at the very beginning of your book.

Finally, ebookmaker removes links to anything that will not be part of the final e-book, in particular any external links.