Sources for Scan Harvesting

From DPWiki
Jump to navigation Jump to search
🔧 This content is being reviewed and revised

For the definitive list of image sources, see the Image Sources Script Listing.

See the Image Sources Script Listing for an up-to-date list of sources that are available for Project Managers to select from when creating a project. This list will always be up to date, because only sources on this list may be used in DP projects. If you are a PM or a CP, you may Propose a new image source using this form. See the form field descriptions for more information on the types of information that should be included in the request.

Manual list of sources

The following is a list of sources for Content Providers (CPers) interested in Harvesting scans for Distributed Proofreaders (DP) projects.

Note: The following list is manually maintained, and is probably out of date at any given time. It also has a number of sources that are not on the approved list, as well as more details than are available in the list. For a complete list of sources available for projects, see the listing linked to at the top of this wiki page.

If you can add to or improve this list, please do so! Add any comments as necessary in the accompanying list for each site. If a source is small and you want to "reserve" it for yourself to avoid accidental duplication of effort, say so as well.

Thanks to bconstan and all contributors to the original PGDP forums thread!

Academia Argentina de Letras (Español-Spanish)

  • Obras en español - Works in Spanish.

Antique Books (English)

  • We can use their images as well as long as we mention them as the source.

Australian Studies SETIS (English)

  • Australian Studies Resources at the University of Sydney Library's Scholarly Electronic Text and Image Service (SETIS).

Biblioteca Virtual Miguel de Cervantes (Spanish-Español)

Obras en castellano -- Works in Spanish.

CAMENA Online-Editionen (Latin)

  • Latin poetry

Early Canadiana Online

  • (From the PP forum) "We have received official permission from the people who run the site to use their scans. They would like to be acknowledged in the credits line for all books that come from their scans, which we have agreed to do."
  • Content providers are currently working together to harvest this archive. Please see this page for details.

Case Western Reserve University Preservation Department (English)

  • "You are very welcome to use our on-line book collection. We would appreciate receiving copies of the proofed texts. All books we have in our collection are in public domain." Some books have been done, and some are cleared but not completed.

Digital Mathematics Archive (English?)

  • The Digital Mathematics Archive is a digital collection of mathematical sources, with a primary focus on documents from the late 19th century through today. -- not a secure site

Digital South Asia Library (Asiatic languages)

ELEKTRA (Danish)

  • Danish books and manuscripts

European Illustrated Books and Manuscripts (Various languages)

  • Manuscripts and Printed Books from Keio UL.
Original URL no longer operational; possibly should be this?

Fondo Antiguo: Biblioteca de la Universidad de Sevilla

Francis Drake (various languages)

  • Stuff related to Francis Drake, from the U.S. Library of Congress

Google Book Search

GDZ (German, English, Franch, Latin)

  • Variety of non-fiction works. Mostly German, some English, French and Latin.

Grace's Guide to British Industrial History

HEARTH Project (English)

  • Page images and text of Home economics books and journals from 1850-1950. Random samplings show characteristic OCR errors, indicating that the text has probably had only light proofing, if any. A suitable subject for archive raiding, perhaps, if permission is obtainable.

Hellinomnimon (Greek)

  • Digital Library of Greek Philosophical and Scientific Books and Manuscripts (1600-1821)
  • The list of available books and authors is here (in Greek)

Historic Pittsburgh - Full Text Collection at

  • no permission :?:

Hockliffe Project (English)

Indo-European Language Resources (Various languages)

  • Some of them have been done before for PG. I don't think the author of the site is as pedantic with copyright as PG, so not all of them are clearable. Most of them are going to be pains to do through here, as making them fit in Latin-1 was not a concern.

Internet Scout Project (English?)

  • From The Scout Report

Johannes A Lasco Bibliothek (German, Latin)

  • German and Latin books (pre-1600)

Liam's Pictures from Old Books (English?)

  • Over 150 high-resolution public domain images scanned from old books! (These are not complete books.)

Library of Congress Digitization Project (English)

  • (7 million pages!). Be aware that some of these link to other sites (such as Making of America)

Mateo - Mannheimer Texte Online (German, Latin)

  • Mixture of German and Latin works, however most being from 16-17th century might be OCR challenges

Million Books Project (General) (Mainly English)

  • A large number of books available, however be warned that quality control is poor. Check that the book has all pages available and properly scanned. Also, do not trust the dates posted, check against the title and verso of the book.

MBG Rare books (Various languages)

  • Books on Botany from a number of languages.

National Academies Press (English)

National Transportation Library - Digital Collection (English)

New England History and Geneaology (English)

  • Genealogical books, and links to related sites.

Nietz Full-Text Collection (English)

  • 140 school textbooks from the 19th century.

Nineteenth-Century American Children and What They Read

  • no permission?

Oak Knoll * Digital Books about Books

On-Line Digital Archive of Documents on Weaving and Related Topics PDFs at Arizona

  • :!: some modern material :!:
  • no permission :?:

Our Roots / Nos Racines: Canada's Local Histories Online (English and French)

  • permission. Desired credit line unknown.

Schoenberg Center for Electronic Text and Image (English?)

  • Books and text from the 9th through 20th Centuries.

Seforim Online (Primarily Hebrew, but some English and German and possibly other languages)

  • Blanket permission; however, there's copyright and possibly copyright material mixed in.

Stuebers Online Library (German and English)

  • 442 Biology books mainly German; some English, Dutch, Latin.
  • Some are already digitized (turned to text); most are jpeg scans.
  • Permission for working on these works is available; site owner requests to receive text versions when done. As usual, we need to our own copyright clearance. This site is careful to only include works PD in the EU (life + 70 years rule).
  • Harvesting Coordination Page.

Swedish imprints before 1700 (Swedish)

  • Swedish books (pre-1700) (Requires FlashPix browser plugin)

Universidad Complutense, Madrid, Spain (Spanish, Italian, Latin)

Universität Freimore (German)

  • German manuscripts

Universität Tübingen (German)

  • German books and manuscripts

The University of Michigan Historical Mathematics Collection

  • no permission :?:

University of Missouri-Columbia Libraries: Digital Library Collections (English)

  • Many of the texts available here are early accounts of local and regional history.

University of New Mexico & Cooper Ornithological Society Texts (English)

  • Books and journals about birds.

University of Wisconsin

Browse the UW Digital Collections (See Image Sources script); many projects are hosted here, including:

  • African Studies Collection
  • Belgian-American Research Collection
  • Digital Library for the Decorative Arts and Material Culture
  • Ecology and Natural Resources Collection
  • Foreign Relations of the United States
  • Historical Primary Sources
  • History of Science and Technology
  • The University of Wisconsin Collection
  • Wisconsin Pioneer Experience

United States Government Publication Digitization Projects Registry

  • A listing of many projects that are producing scans, and sometimes text, of U.S. government publications, all of which are automatically in the public domain.

  • Lots of catalogues, ephemera, manuals and references for machining, machinists, metal and wood working.
  • Most are good quality high resolution grey scale scans with a lot of illustration and table content.
  • Many are from after 1923, but there are a good deal (perhaps half of them) from before 1923.

Warburg Institute Library Digital Collection

  • 108 books so far, mostly Latin and Italian

Yale Medical Historical Library (English?)

  • The Historical Library contains a large and unique collection of rare medical books, medical journals to 1920, and other items.

Lists of eBooks

The following sites have listings of ebooks. While these sites may or may not have content, they provide good information on where to find content:

Digital Information Organization in Japan

  • Links to a number of Japanese digital libraries.

Internet Archive

This page links both to Internet Archive's Wayback Machine, and their Archive search for other media. The search for books is below the Wayback Machine search. You may restrict your search to Texts only by clicking on the appropriate icon.

Internet Public Library

Online Books Page

At upenn. A good place to search for titles

Digital Book Index

The site states: "Digital Book Index provides links to more than 165,000 full-text digital books from more than 1800 commercial and non-commercial publishers, universities, and various private sites. More than 140,000 of these books, texts, and documents are available free."