Benefits of DP-Prepared eBooks

From DPWiki

At Distributed Proofreaders, we take the Optical Character Recognition (OCR) and scans of a book and create a professional quality ebook complete with illustrations -- an ebook that can be easily read with whatever reading settings the reader wants and that contains text that can be searched, looked up or even translated or read aloud as the ereader permits. However, DP volunteers are sometime asked what the benefit is of the work we do to prepare ebooks for Project Gutenberg rather than just making the page scans or related OCR available for download.

After consultation with our volunteers, here is a list of the benefits provided by a DP-prepared ebook:

  • If a reader tries to read an ebook comprised of page scans (usually in PDF format), there are several problems:
    • Readers are stuck with the size of the print on screen (enlarging enlarges the entire page beyond the borders of the reader)
    • Readers don't have access to search or read-aloud or text lookup or translation that is available in many ereaders.
    • Readers can't control font, line-spacing, background, or margins, etc.
    • Because the scans are large, the book has a large download and storage footprint.
    • Page scans can be difficult or impossible to read for those with visual impairments, dyslexia, or other disabilities.
  • Ebooks comprised of unedited/unformatted OCR text also have serious problems:
    • They generally contain numerous scanning errors.
    • The lack of formatting means that tables and other specially formatted text often become unreadable
    • There are no illustrations.
  • DP ebooks are available from Project Gutenberg in HTML, several flavours of epub, kindle, and plain text.
  • DP ebooks provide internal cross-links, enabling the reader to jump to a page referenced in the Contents or Index, or in the body of the text, or to jump to a footnote or endnote from its anchor, and back again. There can also be cross-links between different volumes of the same work, when they're all on Project Gutenberg.
  • DP ebooks make it possible for readers to adjust the text in their ereaders for size, font and background.
  • DP books are single-edition. We do not abridge or bowdlerize the text or include texts that were not in that edition.
  • DP ebooks are of a consistent high quality. The books are checked carefully so that there are no missing, illegible, faded, discoloured illustrations or pages and the texts are checked in multiple rounds by different volunteers for scanning errors and other issues. The Post-Processors also check the books and ensure that the work is internally consistent. That is not the case with large archives of scanned texts.
  • Most e-readers work seamlessly with epub and/or kindle ebooks such as these prepared at DP. Among other things, the e-readers allow readers to bookmark pages, jot notes, copy or share selections of text, search, etc.
  • Various e-readers now allow readers to listen to the epub/kindle text being read aloud to them using voices that have become incredibly clear and skilled at sounding like a human speaker. This capability is very limited with simple OCR text and impossible with page scans.