Distributed Proofreaders 33,568 titles preserved for the world!
131 in Feb 2017 — 110 in Mar 2017 — More...
 

Scanning FAQ

See also Content Provider's FAQ.


Do I have to use Abbyy Finereader?
What kind of scanner should I get?
Should I get a scanner with an automatic document feeder?
What kind of scanner does charlz have?
How much does one of them cost anyway?
My scans are coming out very bad, any suggestions?
Can I use a digital camera to "scan" the images?
How long does it take to scan a book?
I have a scanner but no books that are qualified /
I have several books I'd like to get on the site but don't have access to a scanner /
I have an image of a book but no OCR software

I don't have a computer; How can I help?
I'm using Linux, is there an OCR package I can use?
Speaking of Linux, what scanners are supported?
Are there any free OCR packages available?


Do I have to use Abbyy Finereader?

No, of course not. The scanning guidelines are heavily skewed toward using that simply because there are more people who have been using that package involved in the site, so there are more people familiar with it to answer questions. You don't need to buy the newest version, version 5.0 Pro is great for nearly everything most people will need to do. (The big three: charlz, aldorondo and JulietS all use version 5.0 Pro.) It is still available from many software distributors for much less than the latest version, and you can often find it for sale used on ebay. Avoid the Home And Sprint versions if possible. They are missing a lot of the functionality that makes the job easy.
Two other packages that people have been successful with are: OmniPage Pro 10 & 11 and Textbridge Millennium Pro. They both have good recognition rates and similar functionality as far as automating the scanning process. The details differ but reading the help files should get you on the right track. OEM software that comes free with scanners CAN be used... just be aware that accuracy is typically much worse, AND be prepared to do a lot more saving and formatting manually.


What kind of scanner should I get?

Well, there are a lot of choices. Generally you should stick with a flatbed scanner. The typical scanner you find in a computer store is a bit bigger than letter size (or A4 if you live in Europe) and generally comes with one of three interfaces: SCSI, USB and parallel. SCSI is the fastest but may require additional hardware to interface to your computer. Most computers come with a USB port these days and these scanners are usually easiest to set up. Parallel is the slowest interface but may be the only realistic choice for older computers. There are some scanners floating around with firewire or USB2 interfaces but these are usually more expensive and intended for specific purposes. You probably will want to avoid "handheld" scanners where you run the scanning lens down the page of text. They require a smooth steady motion which can be difficult to do once or twice let alone the 300 or 400 times to do a full length book. Some are also not wide enough to do a page in a single scan and need the images "stitched" back together; a process that can be painstaking and time consuming.


Should I get a scanner with an automatic document feeder?

This is a mostly matter of personal preference. ADFs CAN make scanning go much quicker, but be aware that in order to run a book through an ADF, the book NEEDS to be destroyed, so it is probably not realistic for rare or valuable books. The ADF is often available as an option on a standard scanner and can be installed and removed as needed so just having one doesn't mean you have to use it. If you can justify the added cost, it can make things easier and faster but it is not strictly necessary.


What kind of scanner does charlz have?

Fujitsu FI-4340C color and duplex Flatbed and ADF

super scanner

The process we (Charles Franks) use is we tear off the cover of the book (gasp), chop the edges on four
sides of the book (double gasp!!), send it through the ADF, and then let the book run through the site.


How much does one of them cost anyway?

About $3500 US.

Wow...

Yep.


My scans are coming out very bad, any suggestions?

It depends on what is wrong with them. The default scanning software settings are usually pretty good. Make sure that you are using a "text" or OCR setting if one is available. Scan in black and white, not grayscale. 300 or 400dpi is usually fine unless your text is extremely small. Higher resolution scans make much larger image files, and they can get pretty unwieldy in a hurry. Try adjusting the brightness up or down to try to clear up muddy or washed-out images. Experiment around a little. It is a good idea to make several test scans and test the OCR on them before doing the book. If you are using Abbyy to scan the text, you can choose to let Abbyy control the brightness level rather than the twain driver. This will do adaptive adjustment of the brightness level to insure usable scans, but tends to slow down scanning a lot especially on slower computers. Make sure to press the spine of the book down to flatten out the pages against the scanner bed. To much "tenting" will cause guttering where it looks like the text is running off on a curve


Can I use a digital camera to "scan" the images?

This question comes up now and again especially as digital cameras have gotten cheaper and better. The answer is... Maybe. If you have a camera that can focus closely, light the page well and uniformly (don't count on flash photography) ideally, have the camera mounted on a stand or tripod to minimize movement, and make sure the page is as flat as possible. Set your camera to take "high quality" high resolution in black and white. Rotate, crop and convert your images as necessary. Fire up your OCR program and give it a shot. Yes, you can probably get usable "scans", but be prepared for relatively low accuracy on OCR unless you are very good or very lucky.


How long does it take to scan a book?

It depends on the speed and options of your scanner and the condition and size of the book. A high speed scanner with an ADF can scan a 400 page book with pages in good condition in less than ten minutes. For a standard flatbed scanner doing manual page turning, once you get into a rhythm, you can probably do a scan every 20 to 40 seconds or about 3-6 pages per minute (two pages per scan), 180-360 per hour. Allowing for glitches, short breaks, etc. a 400 page book will probably take in the region of two hours to scan.


I have a scanner but no books that are qualified /
I have several books I'd like to get on the site but don't have access to a scanner /
I have an image of a book but no OCR software

Leave a message in the "Content Provider" forum to that effect. You can also use the OCR Pool. See the Content Provider Forum for more details. Ask and someone will help you.


I don't have a computer; How can I help?

How are you accessing this FAQ anyway? Wow, you don't let many things get in your way do you? If nothing else, we can always use money to buy new (old) books, new software, a new super scanner (sooner than anticipated at the present rate :-) or other incidentals. Find or donate books that someone else can scan. Go to your local library, many have public access computers with Internet. You could log on and proofread a few pages occasionally.


I'm using Linux, is there an OCR package I can use?

There are SOME packages available. Perhaps the most highly developed at the time of this writing is Clara OCR, a Free-GPL OCR package. Its accuracy is poor however, and is not really recommended at this time. (Late 2002) Hopefully, as it develops, it will improve. There are several commercial products that will run on Unix/Linux, but they tend to be VERY expensive. (several thousand dollar range) Probably, your best bet is to make use of the OCR Pool. See the Content Providers Forum for details.


Speaking of Linux, what scanners are supported?

What you probably need to know is: What scanners are compatible with the SANE (Scanner Access Now Easy) driver? Go to this page to check compatibility.
The SANE homepage is here.


Are there any free OCR packages available?

 Here are a few:
http://www.simpleocr.com/ (Windows)
http://www.claraocr.org/ (Linux)
http://jocr.sourceforge.net/ (Linux)
http://www.expervision.com/webtr6.htm (Windows)
http://ftp.cityu.edu.hk/pub/chinese/ifcss/unix/ocr/omniocr2.2.README (Unix - Chinese)
http://http.cs.berkeley.edu/~fateman/kathey/ocrchie.html (Linux)

Just be aware, in the OCR world, you typically get what you pay for.



 
Copyright Distributed Proofreaders (Page Build Time: 0.003) Report a Bug