Sources for Scan Harvesting
See the Image Sources Script Listing for an up-to-date list of sources that are available for Project Managers to select from when creating a project. This list will always be up to date, because only sources on this list may be used in DP projects. If you are a PM or a CP, you may Propose a new image source using this form. See the form field descriptions for more information on the types of information that should be included in the request.
Manual list of sources
Note: The following list is manually maintained, and is probably out of date at any given time. It also has a number of sources that are not on the approved list, as well as more details than are available in the list. For a complete list of sources available for projects, see the listing linked to at the top of this wiki page.
If you can add to or improve this list, please do so! Add any comments as necessary in the accompanying list for each site. If a source is small and you want to "reserve" it for yourself to avoid accidental duplication of effort, say so as well.
Thanks to bconstan and all contributors to the original PGDP forums thread!
- Obras en español - Works in Spanish.
- We can use their images as well as long as we mention them as the source.
- Australian Studies Resources at the University of Sydney Library's Scholarly Electronic Text and Image Service (SETIS).
Obras en castellano -- Works in Spanish.
- Latin poetry
- (From the PP forum) "We have received official permission from the people who run the www.canadiana.org site to use their scans. They would like to be acknowledged in the credits line for all books that come from their scans, which we have agreed to do."
- Content providers are currently working together to harvest this archive. Please see this page for details.
- "You are very welcome to use our on-line book collection. We would appreciate receiving copies of the proofed texts. All books we have in our collection are in public domain." Some books have been done, and some are cleared but not completed.
- The Digital Mathematics Archive is a digital collection of mathematical sources, with a primary focus on documents from the late 19th century through today. -- not a secure site
- Danish books and manuscripts
European Illustrated Books and Manuscripts (Various languages)
- Manuscripts and Printed Books from Keio UL.
- Original URL no longer operational; possibly should be this?
- Latin, Spanish. They seem to have digitized their older volumes, and plan to do all the important ones. -- Linked URL no longer available. Possibly https://bib.us.es/machado/fondo_antiguo
- Stuff related to Francis Drake, from the U.S. Library of Congress
- Harvesting/Google Book Search -- information about coordination of harvesting these scans
- Google Book Search Coordination -- wiki for claiming projects
- Variety of non-fiction works. Mostly German, some English, French and Latin.
Grace's Guide to British Industrial History
- Have two largeish collections of pre-1923 engineering related periodicals. All material on the site is under Creative Commons.
- The Engineer magazine, UK, published from 1856 - 1985
- Engineering magazine, UK, published from 1866 - 1983. Several scans of this periodical are also available on Wikimedia commons
- durgledoggy intended to see what could be done with these. However "real life" prevents much activity on the project. So these are welcome to be done by anyone interested.
- Page images and text of Home economics books and journals from 1850-1950. Random samplings show characteristic OCR errors, indicating that the text has probably had only light proofing, if any. A suitable subject for archive raiding, perhaps, if permission is obtainable.
- Digital Library of Greek Philosophical and Scientific Books and Manuscripts (1600-1821)
- The list of available books and authors is here (in Greek)
- no permission :?:
- Some of them have been done before for PG. I don't think the author of the site is as pedantic with copyright as PG, so not all of them are clearable. Most of them are going to be pains to do through here, as making them fit in Latin-1 was not a concern.
- From The Scout Report
- German and Latin books (pre-1600)
- Over 150 high-resolution public domain images scanned from old books! (These are not complete books.)
- (7 million pages!). Be aware that some of these link to other sites (such as Making of America)
- Mixture of German and Latin works, however most being from 16-17th century might be OCR challenges
- A large number of books available, however be warned that quality control is poor. Check that the book has all pages available and properly scanned. Also, do not trust the dates posted, check against the title and verso of the book.
- Books on Botany from a number of languages.
- Genealogical books, and links to related sites.
- 140 school textbooks from the 19th century.
- no permission?
- :!: some modern material :!:
- no permission :?:
- permission. Desired credit line unknown.
- Books and text from the 9th through 20th Centuries.
- Blanket permission; however, there's copyright and possibly copyright material mixed in.
- 442 Biology books mainly German; some English, Dutch, Latin.
- Some are already digitized (turned to text); most are jpeg scans.
- Permission for working on these works is available; site owner requests to receive text versions when done. As usual, we need to our own copyright clearance. This site is careful to only include works PD in the EU (life + 70 years rule).
- Harvesting Coordination Page.
- Swedish books (pre-1700) (Requires FlashPix browser plugin)
- Nearly 1900 scanned items, with dates (which is handy for us historical sorts).
- To navigate the list, type a number into the box next to the "IR A UNA ENTRADA" button.
- This next URL is the list sorted by date:
- Hacer una buscada de los recursos digitales de la biblioteca
- German manuscripts
- German books and manuscripts
- no permission :?:
- Many of the texts available here are early accounts of local and regional history.
- Books and journals about birds.
University of Wisconsin
Browse the UW Digital Collections (See Image Sources script); many projects are hosted here, including:
- African Studies Collection
- Belgian-American Research Collection
- Digital Library for the Decorative Arts and Material Culture
- Ecology and Natural Resources Collection
- Foreign Relations of the United States
- Historical Primary Sources
- History of Science and Technology
- The University of Wisconsin Collection
- Wisconsin Pioneer Experience
- A listing of many projects that are producing scans, and sometimes text, of U.S. government publications, all of which are automatically in the public domain.
- Lots of catalogues, ephemera, manuals and references for machining, machinists, metal and wood working.
- Most are good quality high resolution grey scale scans with a lot of illustration and table content.
- Many are from after 1923, but there are a good deal (perhaps half of them) from before 1923.
- 108 books so far, mostly Latin and Italian
- The Historical Library contains a large and unique collection of rare medical books, medical journals to 1920, and other items.
Lists of eBooks
The following sites have listings of ebooks. While these sites may or may not have content, they provide good information on where to find content:
- Links to a number of Japanese digital libraries.
This page links both to Internet Archive's Wayback Machine, and their Archive search for other media. The search for books is below the Wayback Machine search. You may restrict your search to Texts only by clicking on the appropriate icon.
At upenn. A good place to search for titles
The site states: "Digital Book Index provides links to more than 165,000 full-text digital books from more than 1800 commercial and non-commercial publishers, universities, and various private sites. More than 140,000 of these books, texts, and documents are available free."