PPTools/EPUBCheck

From DPWiki
Jump to navigation Jump to search

Introduction

EPUBCheck is a conformance checker for EPUB publications. It is a command line tool, and is also incorporated into Guiguts releases. If you decide to download it independently, for use outside Guiguts, then be aware that you may need to adjust the stack size of your Virtual Java Machine as described here.

Fixing errors found by EPUBCheck

An EPUB file is in fact a zip file containing several folders and files, including CSS and HTML files. The HTML files are created by breaking up your original HTML file into chunks, e.g. at a chapter heading, each with a header and other minor adjustments. Since EPUBCheck only has access to these "chunk" HTML files, it is unfortunately therefore not possible for it to tell you directly where you need to make an edit in your HTML file. However, by using the procedure below, you can easily track the source of the issue.

First a quick overview of the error messages. Here is an example error message after processing by Guiguts. (Note that if Guiguts did not recognise the type of error, or you are running EPUBCheck outside of Guiguts, it will give you the whole filename in the original error message, rather than the abbreviated one using "...")

ERROR(RSC-005): ...-7.html.html(33,17): Error while parsing file: element "div" not allowed here, ... (more detail on error)

The first word indicates the type of issue: ERRORS and WARNINGS should be corrected, and you may want to check you understand the reason for any INFO messages, since these might indicate a subtle mistake. After the type of issue comes the error code (listed on this W3C page). If the error occurs within an HTML file in the epub, then Guiguts will replace the long, complicated filename with "...-N.html.html" (or .xhtml for EPUB3 files) to make the error easier to read. More on how to use that filename below. After the filename comes "(line,column)" showing the line and column number where the error occurs in the file inside the EPUB (not your original HTML file). Finally there is text to describe the nature of the error.

How to find the location to fix in your HTML

  1. Note the number "N" from the "...-N.html.html" filename part of the message
  2. Note the line and column number from the "(line, column)" part of the message
  3. If you have the Calibre EPUB editor, then open the EPUB file with that editor
  4. If you do not have Calibre, rename the epub file to have a ".zip" extension, rather than a ".epub" extension. On some Windows systems you may need to make known extensions visible (usually through the View menu in a File Explorer window) in order to do this. Windows systems will check you are sure that you want to change the file from an epub to a zip. Now unzip the zip file, and open the folder that is created.
  5. Inside the epub/zip file is a folder named OEBPS, and inside that folder will be a few ".css" files and several ".html" or ".xhtml" files. They typically have long names, like "3556407428965394645_peggysmall-7.html.html". You are just interested in the last part of the name, particularly the number (in this case "7") that must match the number "N" you noted in step 1.
  6. Open the correct ".html"/".xhtml" file in any editor (Guiguts, Notepad, vim, etc.) and go to the line and column number you noted in step 2.
  7. Work out what the problem is - in the example error, a
    was inside a , and note the text around the error so that you will be able to find the same point in your original HTML file.
  8. Edit your original HTML file, find the same bit of text and HTML code, and correct the issue.
  9. Recreate the EPUB files (in Guiguts or online), then recheck the EPUB using EPUBCheck.