Content Providing on the Mac

From DPWiki
Jump to navigation Jump to search
Exquisite-khelpcenter.png Note

Content on this page is outdated and needs maintenance

Scanners

Many scanners can be used with macs, although not all. (Sadly, the Plustek OpticBook is one that can't.) It's probably a safe bet that the few sold by the Apple Store will work, and the reviews there are mac-specific. But there are cheaper places to actually buy one, and a wider choice.

Software can increase your range of options. Any scanner supported by SANE should work with OS X, provided you are willing to install the right SANE driver. VueScan, which is shareware, also supports some non-mac scanners, and has a loyal following.

It is also increasingly possible to use digital cameras for image acquisition, and these are generally platform independent, though the image quality is not as good as what can be obtained with a scanner. See Tips & Tricks for Shooting Text with Digital Camera if you want to try this method. (If you do, you may also want to have the OCR done by a Windows user who has a copy of ABBYY Finereader 8, which is designed to get better OCR out of digital camera images.)

Scanners currently used by CPers with macs, together with some mac-specific comments, include:

  • EPSON Perfection 3490 Photo
    I'm fairly happy with this. Installing and setting up (on OS X 10.4) was easy. It scans as .jgp, PDF, TIFF or Apple PICT, but not .png, unfortunately. An irritating feature is that it doesn't remember the directory to save to across sessions--it always defaults to saving in ~Pictures/. Also I find that if I leave the scanner software open but idle, it guzzles up CPU and make my computer (iMac G5) get very noisy. It comes with a mac version of Abbyy Finereader Sprint, but I suspect that was designed pre-Tiger: I can get it to acquire pages directly from the scanner, but not to open previously saved files. (Doesn't matter, cos Sprint was never going to be adequate anyway.) Laurawisewell

Harvesting

Gharvest, for harvesting from Google Print, can run on Mac OS X.

There are free programs that you can use to download multiple files from other sources. Such as:

  • LinkSequenceDownloader If the files you want (and/or other bits of their URLs) are named using consecutive numbers, you just specify this as a range, and the program downloads them all to a folder on Desktop. It's a very simple, lightweight program. I found it worked lightning fast ... so fast that the website locked me out for a while. :-(
  • Platypus Downloader For this you create a text file listing all the URLs you want, or paste them into the program manually, and it downloads them all to a directory you specify. You can interrupt and resume downloads. I found it fairly slow (as it's downloading them sequentially?) but less effort than doing it by hand in my browser. And it was slow enough that the website didn't decide I was a robot :-D

OCR

There are several decent OCR packages for Macs. Be aware that the cheap and free options are particularly error prone.

This may be of no help whatsoever, but my feeling has been that every error not made by OCR is one less error that has to be found by a proofer. I haven't been satisfied with any of the OCR packages I've tried, and have chosen to use the OCR pool for all my projects. (Except for the one I typed in!) I've had very good results this way. DANewman


ABBYY Finereader Express for Mac (as of July 2011) gives accurate results but is not acceptable for DP projects because it outputs the text from all your input images into one long file.

OCRkit (for Mac) is error prone. wannado

Image Processing

Both of these are powerful image manipulation programs. GraphicConverter also can batch process images. GIMP might too, but only if you're good at scripting. The Guide to Image Processing includes instructions on using GIMP.

Automator

Automator can do some useful things for you, although I find it is rather slow and rather limited. Its advantage is that you already have it, and it's easy to use. Here are some features that may be useful when CPing:

  • With images (under the Preview category in the workflow building pane)
    • Change format (to jpg, png, or various formats you're less likely to want)
    • Scale (although my guess is that the quality of the rescaling may not be good)
    • Crop (cropping always centred though)
    • Rotate (by multiple of 90 degrees)
    • Flip
  • With text (under TextEdit)
    • Combine text files
  • With PDF (under PDF)
    • Render PDF pages as images (gets you a separate file, png say, from each page of a multipage PDF. You have to choose a resolution. This may not be the best route for doing this.)
  • Finder
    • Rename files (in conjunction with Get selected, Copy, Sort) This can rename your page files sequentially, add or remove text from the filename, etc. But it's not as speedy at it as GraphicConverter, nor can it do fancier things like incrementing filenames by two or whatever.