Image cleanup

From DPWiki
Jump to: navigation, search
Exquisite-khelpcenter.png Note

This page needs maintenance; some tools listed here may or may not still be available.

Also see the Guide to Image Processing. These instructions are for CPs, whereas Guide to Image Processing is for PPs. While much of the material overlaps, what makes sense for the CP to do, while having the original in hand, and what makes sense for the PP to do, in preparing images for html versions, are somewhat different.

Tools for Image Manipulation

  • Imagemagick (OSX, Windows, Linux) -- commandline tool for resizing, rotation, applying thresholds, converting formats etc. Website: [1].
  • Unpaper (Linux)) -- Commandline tool. From the project's website: "unpaper is a post-processing tool for scanned sheets of paper, especially for book pages that have been scanned from previously created photocopies. The main purpose is to make scanned book pages better readable on screen after conversion to PDF. Additionally, unpaper might be useful to enhance the quality of scanned pages before performing optical character recognition (OCR). Released for Linux, but could perhaps be run on other OSs with the appropriate compiler?" -- For example, it also runs on Mac OS X and cygwin.
  • Pngcrush (MacOS X, Linux, Windows) -- Commandline tool. Optimises PNG image compression to produce the smallest possible files for upload to DP. GUIprep can be configured to call Pngcrush as part of its pre-processing. Website.
  • jpegoptim (MacOS X, Linux, Windows) -- Commandline tool. Losslessly optimises JPEG images to reduce file size without changing image contents.
  • Scan Tailor is a very useful cross-platform tool, with a GUI interface, for deskewing, splitting, resizing, and adding/removing margins.

Viewing

  • Geeqie (Linux) -- previously known as gqview, useful for flipping through a series of images to check they are all in the right order, right way up etc. Website: [2]

Specific Cleanup Tasks

Descreening

Removal of the half-tone dots seen in e.g. reproductions of photographs.

  • If your scanner has a de-screening capability, use it! Settings for it can usually be found under options for color scanning, but every scanner interface is slightly different.
  • Using Photoshop
  • Using the GIMP

Zoom in until you can see the actual screen dots. Use the measuring tool to determine the separation between them. Use the first of the two Gaussian Blur functions (IIR), setting the distances at a little over half of the distance between pixels.

Deskewing

Use the rotate tool in GIMP or Photoshop. Also unpaper can automatically determine and apply skew correction to a batch of images (not perfect, but a good approximation).

PhotoFiltre is a free Windows application that has good deskewing and rotating functions.

If rotating in GIMP, be sure to set it to cubic (best).

Rescaling before OCR

When using ABBYY FineReader to produce OCR texts of harvested images that were scanned at resolution lower than 300dpi, considerably improved results can be obtained by re-scaling the images to 300dpi. For example, using the Imagemagick suite:

mogrify -resample 300x300 *.png

will convert all PNG files in the current directory to a 300 dpi resolution.