|
13,418 titles preserved for the world!
179 in Jul 2008 — 186 in Aug 2008 — More... |
| DP | · Register · Help |
Scanning FAQSee also Content Provider's FAQ. Do I have to use Abbyy Finereader? What kind of scanner should I get? Should I get a scanner with an automatic document feeder? What kind of scanner does charlz have? How much does one of them cost anyway? My scans are coming out very bad, any suggestions? Can I use a digital camera to "scan" the images? How long does it take to scan a book? I have a scanner but no books that are qualified / I have several books I'd like to get on the site but don't have access to a scanner / I have an image of a book but no OCR software I don't have a computer; How can I help? I'm using Linux, is there an OCR package I can use? Speaking of Linux, what scanners are supported? Are there any free OCR packages available?
Do I have to use Abbyy Finereader? No, of course not. The scanning guidelines are heavily skewed toward
using that simply because there are more people who have been using
that package involved in the site, so there are more people familiar
with it to answer questions. You don't need to buy the newest version,
version 5.0 Pro is great for nearly everything most people will need
to do. (The big three: charlz, aldorondo and JulietS all use version
5.0 Pro.) It is still available from many software distributors for
much less than the latest version, and you can often find it for sale
used on ebay. Avoid the Home And Sprint versions if possible. They are
missing a lot of the functionality that makes the job easy.
What kind of scanner should I get? Well, there are a lot of choices. Generally you should stick with a flatbed scanner. The typical scanner you find in a computer store is a bit bigger than letter size (or A4 if you live in Europe) and generally comes with one of three interfaces: SCSI, USB and parallel. SCSI is the fastest but may require additional hardware to interface to your computer. Most computers come with a USB port these days and these scanners are usually easiest to set up. Parallel is the slowest interface but may be the only realistic choice for older computers. There are some scanners floating around with firewire or USB2 interfaces but these are usually more expensive and intended for specific purposes. You probably will want to avoid "handheld" scanners where you run the scanning lens down the page of text. They require a smooth steady motion which can be difficult to do once or twice let alone the 300 or 400 times to do a full length book. Some are also not wide enough to do a page in a single scan and need the images "stitched" back together; a process that can be painstaking and time consuming.
Should I get a scanner with an automatic document feeder? This is a mostly matter of personal preference. ADFs CAN make scanning go much quicker, but be aware that in order to run a book through an ADF, the book NEEDS to be destroyed, so it is probably not realistic for rare or valuable books. The ADF is often available as an option on a standard scanner and can be installed and removed as needed so just having one doesn't mean you have to use it. If you can justify the added cost, it can make things easier and faster but it is not strictly necessary.
What kind of scanner does charlz have? Fujitsu FI-4340C color and duplex Flatbed and ADF The process we (Charles Franks) use is we tear off the cover of the
book (gasp), chop the edges on four
How much does one of them cost anyway? About $3500 US. Wow... Yep.
My scans are coming out very bad, any suggestions? It depends on what is wrong with them. The default scanning software settings are usually pretty good. Make sure that you are using a "text" or OCR setting if one is available. Scan in black and white, not grayscale. 300 or 400dpi is usually fine unless your text is extremely small. Higher resolution scans make much larger image files, and they can get pretty unwieldy in a hurry. Try adjusting the brightness up or down to try to clear up muddy or washed-out images. Experiment around a little. It is a good idea to make several test scans and test the OCR on them before doing the book. If you are using Abbyy to scan the text, you can choose to let Abbyy control the brightness level rather than the twain driver. This will do adaptive adjustment of the brightness level to insure usable scans, but tends to slow down scanning a lot especially on slower computers. Make sure to press the spine of the book down to flatten out the pages against the scanner bed. To much "tenting" will cause guttering where it looks like the text is running off on a curve
Can I use a digital camera to "scan" the images? This question comes up now and again especially as digital cameras have gotten cheaper and better. The answer is... Maybe. If you have a camera that can focus closely, light the page well and uniformly (don't count on flash photography) ideally, have the camera mounted on a stand or tripod to minimize movement, and make sure the page is as flat as possible. Set your camera to take "high quality" high resolution in black and white. Rotate, crop and convert your images as necessary. Fire up your OCR program and give it a shot. Yes, you can probably get usable "scans", but be prepared for relatively low accuracy on OCR unless you are very good or very lucky.
How long does it take to scan a book? It depends on the speed and options of your scanner and the condition and size of the book. A high speed scanner with an ADF can scan a 400 page book with pages in good condition in less than ten minutes. For a standard flatbed scanner doing manual page turning, once you get into a rhythm, you can probably do a scan every 20 to 40 seconds or about 3-6 pages per minute (two pages per scan), 180-360 per hour. Allowing for glitches, short breaks, etc. a 400 page book will probably take in the region of two hours to scan.
I have a scanner but no books that are qualified / I have several books I'd like to get on the site but don't have access to a scanner / I have an image of a book but no OCR software Leave a message in the "Content Provider" forum to that effect. You can also use the OCR Pool. See the Content Provider Forum for more details. Ask and someone will help you.
I don't have a computer; How can I help? How are you accessing this FAQ anyway? Wow, you don't let many things get in your way do you? If nothing else, we can always use money to buy new (old) books, new software, a new super scanner (sooner than anticipated at the present rate :-) or other incidentals. Find or donate books that someone else can scan. Go to your local library, many have public access computers with Internet. You could log on and proofread a few pages occasionally.
I'm using Linux, is there an OCR package I can use? There are SOME packages available. Perhaps the most highly developed
at the time of this writing is Clara OCR,
a Free-GPL OCR package. Its accuracy is poor however, and is not really
recommended at this time. (Late 2002) Hopefully, as it develops, it
will improve. There are several commercial products that will run on
Unix/Linux, but they tend to be VERY expensive. (several thousand
dollar range) Probably, your best bet is to make use of the OCR Pool.
See the Content Providers Forum for details.
Speaking of Linux, what scanners are supported? What you probably need to know is: What scanners are compatible with
the SANE (Scanner Access Now Easy) driver? Go to this
page to check compatibility. Are there any free OCR packages available? Here are a few: http://www.simpleocr.com/ (Windows) http://www.claraocr.org/ (Linux) http://jocr.sourceforge.net/ (Linux) http://www.expervision.com/webtr6.htm (Windows) http://ftp.cityu.edu.hk/pub/chinese/ifcss/unix/ocr/omniocr2.2.README (Unix - Chinese) http://http.cs.berkeley.edu/~fateman/kathey/ocrchie.html (Linux) Just be aware, in the OCR world, you typically get what you pay for.
|