How to avoid missing pages in your project

From DPWiki
Jump to: navigation, search

Flipping through books, especially old books, while you're scanning can easily lead to missing a pair of pages. They stick together, you space out and lose your pace ... many things lead to this very common problem.

And, problem it is. If missing pages aren't detected until Post-processing, you can be really stuck trying to get your hands on the book again.

Here are some tips to avoid having a project enter DP with Missing pages:

Check while you scan

While scanning take advantage of a feature in the OCR software to temporarily customize the page numbers. Keep those temporary numbers in-synch with the physical page numbers while you're scanning so you can frequently verify that no pages have been skipped.

Specific example for ABBYY FineReader

  • Scan the initial, often unnumbered, pages. By default they will become page 1, 2, etc. Be careful and double- or triple-check that you haven't missed any.
  • When you hit the first numbered page, even if it's a roman numeral, scan it and then go to the thumbnail section and re-number it. If the default (after the title page, etc.) makes ABBYY give this page number 6 but the roman numeral is "i", double-click on the little page number box and make it "1001". Now, as you progress you can check that "iv" is page "1004" and be confident that you've gotten everything right.
  • If you had a section of roman numerals and then the book starts with a page 1, you're still OK. Since you're confident that all the pages so far are good:
    • Select all of those thumbnails and then use menu item Batch>Renumber Pages...".
    • Start with page number 1 and only renumber selected pages. (This is a safer habit to develop, though not critical for this specific instance.)
    • Now scan page 1, double-click its page number box and make it "1001". Again you are set to be able to verify that the book's page numbers and your page numbers are maintaining alignment.
  • Another not uncommon wrinkle can happen with illustrations. Books sometimes do not "count" these pages in their page numbering. If you run into this, you can maintain things similarly to starting again after the roman numeral page numbers.
    • Select all pages' thumbnails;
    • Batch>Renumber to start at number 1;
    • Scan your illustration (and, likely, an associated blank page)
    • Scan the next numbered page, double-click its page number box and make that page number "match" the physical page (like "1158" for page 158).
  • When you've finished the book, remember to renumber all of the pages to eliminate the gap(s)! This is particularly important if you have illustrations because you want the illustration file to have the same file-number as the associated page's file-number (i050-1.jpg goes with 050.png). Subsequent pre-processing tools may, in renaming png & txt files to be sequential, leave out your illustration jpg files and then you end up with a messy situation to resolve manually.
    • Select-all pages' thumbnails
    • Batch>Renumber to start page number "1"