Talk:Harvesting/Google Book Search
It is not updated at all right now. The person is creating a webpage to support this list. When it will be available is not known yet. --De2164 16:24, 25 May 2006 (PDT)
In the meanwhile tho' it's hoped that those of us harvesting from google book would log their harvesting activities on the Google_Book_Search_Coordination page so that if nothing else, we don't duplicate efforts. Sihaya 11:46, 15 June 2006 (PDT)
Preparing Images for OCR
How can I make the google scans suitable for OCR purposes? I tried
mogrify -resample 300x300 *.png
All that does is take the current resolution and creates more pixels in the same pattern as the original resolution. I have found no good way to improve the images yet. De2164 16:19, 21 June 2006 (PDT)
- Hmm, I am running into similar problems. When I use ABBYY on the Google PDFs it creates 600+dpi TIFs which are unnecessarily huge. I used "pdfimage" but I'm not sure what exactly the use of the pbm is. Hmm. Gren 01:48, 30 December 2006 (PST)
I am not sure why you created the background section? It does nothing for the page in IMHO. De2164 16:34, 21 June 2006 (PDT)
- Yeah, you can remove it. But it think the article is still too long and confusing.--Keichwa 21:05, 21 June 2006 (PDT)
gharvest does not work with the new GBS interface. This is the old description:
- Now that Google allows the download of the entire book in a PDF, the manual download or script should not be needed anymore. However, a few books seem to be fully available, but still lack the PDF download option.
- Google presents the page images for a book one page at a time. You could download all the images manually, or you could use the gharvest download script to harvest the images. gharvest is a Perl command line script written by bgalbrecht.
- If gharvest is running when you get the verification screen, most likely it has stopped and cannot be restarted for 24 hours.