Post-Processing for Epub

From DPWiki

Note: Some of the information on this page is out of date. RST and custom Epubs are no longer used at DP.

What Is Epub?

Epub is an e-book file format for e-book reading devices. Files have the extension .epub. Epubs can be read on various devices, such as the Sony Reader, Nook, iPad, iPhone, iPod Touch, and Android, among others. (As of this writing in January 2011, epubs must be converted to Mobipocket format to be readable on Kindle and Blackberry devices.)

An epub file is essentially a zip archive package containing several XHTML files with the text of the e-book (divided into chapters), as well as special files containing the table of contents and other metadata.

Because epub relies on HTML coding, it is possible to convert an existing HTML file into an epub with relative ease. However, the epub standard has a number of limitations, as it supports only part of the CSS standard. As a result, HTML features as typically coded in DP projects often don't work well in epubs (e.g., page numbers, floated images, dropcaps, tables, external links) without tweaking the code, and some features don't work at all.

Epubs at PG and DP

For some time, PG has been automatically converting DP's HTML (and sometimes text) files to epubs, using a PG-created application, ebookmaker (formerly Epubmaker - not to be confused with the commercial application of the same name). The automatic conversion has been less than desirable in some cases, in particular for projects with complex formatting, because of epub's limitations.

In response, an epub team, headed by rfrank, was set up to look at how DP can introduce some quality control into the generation of epubs from DP projects. The team has been looking at three approaches — RST, traditional, and custom — which are described below.

reStructuredText (RST)

reStructuredText, or RST, is an easy-to-read, plain-text markup syntax available to post-processors to produce e-books. It uses a single source to create text, HTML, and epub files through PG's ebookmaker.

The epub team leader for this approach is unassigned. Links to detailed information about using RST in post-processing can be found here: Post-Processing with RST: reStructuredText. See also the discussion in the Post-Processing with reStructuredText thread.

Traditional PP for Epub

Through this method, the PPer creates text and HTML files via the usual post-processing methods, but codes the HTML to accommodate the limitations of epub. Ebookmaker is then used at PG to convert the HTML file into an epub.

The epub team leader for this approach is srjfoo. See the discussion in the Traditional Post-Processing, goal: ePub convertibility thread.

Custom Epubs

The custom epub approach allows the PPer to create hand-crafted epubs without having to compromise on the HTML version. Under this approach, the PPer creates the text and HTML files in the usual way, using whatever tools he/she prefers, and then, using a copy of the HTML file, edits the HTML code to create a custom epub using the epub-editing software of his or her choice, such as Sigil.

Many of the HTML tweaks used in the traditional approach will be applicable to custom epubs.

The epub team leader for this approach is LCantoni. Detailed information about creating custom epubs can be found here: Post-Processor's Guide to Custom Epubs. See also the discussion in the Custom Epubs thread.

Please note that at this time (January 2012) custom epubs are not yet officially accepted for upload to PG.

Other Information

These DP Wiki pages also provide helpful information about epubs: