Story of DP

From DPWiki
Jump to navigation Jump to search

This story of DP grew out of the Let's Write a Book! thread from 2004. It was outlined back in 2004 and needs plenty of work, but we're starting from the work done by Vasa.

[Blank Page]


That ‘Blank Page’ there is the happiest sight a proofer can see. It can only be improved if it’s in The Grammar of English Grammars or Slave Narratives, or such: (read on, Gentle Reader, and all will be explained), but there is much more to it.

Every book starts as a Blank Page, a Tabula Rasa. The author takes this void, adds a few drops of coloured water, and creates a miracle: a book. The world rejoices: a slab, or a brick, or a grain of material has been added to its structure. Much may be ephemeral; nothing is valueless. What would lead someone to publish the following Vanity Press work:

I don’t understand my son. I can’t get him up in the morning, and he’s out half the night with his friends. He doesn’t want to follow my work, but I’ve told him I’d help get him in another line of work if he wants. He doesn’t want to do anything. He’s wasting his life…

We can understand his pain, but why would we want to read it?

Does the fact that it is older than the Epic of Gilgamesh, a 6,000 year old complaint predicting life today, change anything?

Or a work may be incomprehensible, needing other supports, or merely time, to unfold. Bach’s Brandenburg Concertos had to wait 150 years to be repeated following its initial performance.

And even if a work should never have anything important to say, is that a reason for muzzling it? It meant something to its creator. Who are we to deny it life? If it is such poor quality, might it not suggest to a reader that she, too, is capable of writing?

For books live, and fight, and generate new works.

Books die.

[Blank Page]

The lost comedies and dialogues of Aristotle, Konkani literature, the Mayan codices, Polybius’ ‘Tactics’, the poetry of Archilochus of Paros. Nothing left but blank pages.

But now there is a new guard against the Blank Pages: Distributed Proofreaders.

Protecting history, one page at a time.

Book 1: The Fellowship of the Thing

Being a history of the lore of books, and how it led to the Age of Distributed Proofreaders, and how DP shook the foundations of the printed word.

The Dark Ages: Monks and Monasteries

I think it necessary to point out that, despite our close alliance, Distributed Proofreaders does not have the same philosophy as Project Gutenberg, though we may share the same goals. Who, then, are our ancestors? Are we a new phenomenon, or the most recent generation of an ongoing philosophy?

(You’re way ahead of me…)

But let’s see why that is so.


PG disseminates publications to the world through the Internet. This is congruent to publication. We do not. We are in the business of transcribing works we dig up, from wherever, jealously hoarding and sharing them with a select coterie of associates, of whom PG is the major recipient. Our work is The Book. And The Book was the first computer.


PG is accessed by clients looking up specific information, from anywhere in the world, via the Internet; we transcribe works, often incomprehensible, sometimes distasteful, in languages alien to our tongues, in words alien to our thoughts.

Consider. The computer is not self aware. It indiscriminately takes in what it is handed, and releases it to whoever asks. As does a book.

Consider: the Internet is much older than it appears. Libraries and Monasteries were the original Internet. The file servers, the great centers, copied works by hand from around the world, and, upon request, passed the information on to the curious. The user was ephemeral; the work, eternal.

Was it slower? A relative term. When you are working in eternities, what do mere months or years mean, between the solicitation and receipt of a manuscript? Had you passed to your reward before the information came, another would deal with it.


We are the new monks of the new Dark Ages.


Our age is not Dark? That rather depends on context, doesn't it? To those who lived it, the Dark Ages were not dark. Most of existence, as always, was placid. There was safety in continuity, and there were the charitable institutions to care for those who did not fit the normal mode of life. Hospital care, for example, would never be as good until Florence Nightingale started the new field of Applied Statistics. The darkness was transparent to all but a few. The masses--including their leaders--were content to know what they knew.

In hindsight we recognize the signs of our self-judged ‘darkness’: degradation from the ideal past; strife and invasion from the barbarians; distrust of learning. In foresight, we see the same in our societies.


So far we are on track. And consider--half the world today lives on less than two dollars a day. One fifth have no fresh water. At what period in the past were these figures equaled, or even approached?

Being blind, how many of the signs of darkness do we miss?

All we know is, as did the elder monks, that something is lacking, and we pass to the future what we can, in the hopes that they will understand what we cannot.


Who of the elders would have supposed that their transcriptions would be used, a thousand years later, to investigate medical problems, to understand political movements, to determine, even, the physical health and diet of the monks, from the quality of their writing? How can we begin to guess what use will be made of our work a millennium from now?


We sit in our solitary scriptoriums, and pass information on, on to people who may understand what we cannot. Our incomprehension being twofold: often enough, we do not understand the primary message; always, we cannot understand what the future will invent.

The elder monks left traces of themselves in their works. They used marginalia for comments: 'I'm cold', 'I don't understand this', 'I hate Brother Anselam'. We do the same. Our forums allow the individuality and creativity we are prohibited in our labours.

Did the elders feel anything--repugnance, desire, loss--in transcribing the forbidden pleasures of Ovid? Do we feel the same in writing Jew boy, or nigger, in copying The Protocols of the Learned Elders of Zion? For our predecessors were human, and as intelligent as we. Often, their minds must have been attuned, to some degree, to the works, and a tenebrous sense of the message would have vibrated within their own souls. No matter; the work must be done. With exactitude.


In the eleventh century, a monk made an error in a single Hebrew character, in the word for 'poisoner'. And hundreds, if not thousands, would die from the new Biblical injunction "Thou shalt not suffer a witch to live."

There are unexpected tremours from everything we touch: our work may be dangerous. If we had the last copy of Mein Kampf, would we pass it on, to be a new seed of hatred in innocent generations, or the vaccine that assures that society can finally be immunized to such venom?

Distributed Proofreaders

We sit in our cells, and we transcribe, and we keep faith.

Once upon a time, there was a flowering of culture equal to that of our own times. This period, the acme of Greco-Roman artistry, produced works that still play to discriminating audiences, and lowbrow antics and stunts that still appeal to the masses. Though much has been lost, much has been saved, due to a most unlikely group of rescuers: often illiterate, more often opposed to the works they saved, they are among the earliest cultural heroes we have.

Prehistory: The Story of Project Gutenberg

Project Gutenberg (1971-2009) by Marie Lebert is available on PG's website as well as other details on their About Page.

Ironically, Project Gutenberg, which preserves the writings of others, doesn't have much written history itself. There are scraps of e-mails and guidelines, but many newsletters and other internal writings before 1996 have gone to the great bit-bucket in the sky.

The later half of the '90s marked a graceful blooming of Project Gutenberg's growth. Three related technical factors contributed: the explosion in home PCs brought standardization, which made it easy for non-techies to install scanners, which, in response to the new demand, became plentiful and cheap. And, of course, these years saw the rise in popularity of the Internet, which has always been PG's main channel of communication and distribution.

However, while PG's production expanded geometrically, at Moore's Law rates, there were barriers to participation. Most volunteers had to find an eligible book, scan or type it, and proof the resulting text all by themselves. This was and is a fairly significant amount of work: 40 painstaking hours would be a typical commitment for one book.

Beyond that, simply learning the mechanics of producing e-texts could be a serious challenge for newcomers. Nearly all internal PG communication, except for the Newsletter, was by private e-mail, and instructions had to be repeated many times to individual new volunteers, all of whom showed up with great good will, but most of whom vanished after a week or two.

Michael Hart was unstinting in his editing of incoming texts and handling questions by e-mail, but any one person has only so many hours.

The Directors of Production at the time--Sue Asscher, Dianne Bean, John Bickers and David Price--served as contact points for advice and help, made enormous efforts of production themselves, and tried to share the scanned texts among new volunteers for proofing. They made a huge contribution to building community in PG.

Pietro Di Miceli set up a web site for the project in 1996, and with the popularization of the Web (as opposed to the Internet), this became a beacon for readers and new volunteers.

All of these people reached out to willing volunteers, drew them in, helped them, encouraged them. The Project and all of the readers of the books, now and in the future, owe these people a great debt. Without them, Project Gutenberg could not have achieved what it has. But still, for the most part, each volunteer worked alone.

In 1999, Jim Tinsley wrote, in response to an offer to volunteer:

I think I can best answer your offer, and many others like it, by giving an extended description of what actually happens in the making of PG texts, and why it's often not easy to get started.
There is no agenda, no master list of tasks ready to be given to volunteers. This is often the hardest thing to get across to new volunteers. I know I waited quite a while after volunteering for someone to give me a job to do before I realized it.

Exactly five steps are normally performed in the publishing of an e-text.
1. Someone, somewhere gets a public-domain copy of a text they want to contribute.
2. That volunteer confirms its PD status by sending TP&V to Michael, and getting copyright clearance.
3. Someone, usually the same volunteer, scans and corrects the text, or, if skilled in typing, types the book into an e-text.
4. Someone, often a different volunteer, second-proofs the e-text, removing the smaller errors.
5. The e-text is sent to Michael for posting.

There are three barriers which make it difficult for most people to contribute:
1. Getting a PD book.
2. People without scanners and typing skills have no way of turning a book into an e-text.
3. Even with a scanner, turning a book into an e-text is not easy or quick.
Since, generally, people who have a PD book don't just want to send it off to a stranger for scanning, the people who produce e-texts have to get over all three of these barriers. This is the bottleneck in production. It's relatively easy to get an e-text second-proofed; making it in the first place is the hardest part. You need to have a book, the means to turn it into an e-text and the time and will to do it.
After that comes second proofing. There are two problems here. One is that there may not be enough texts for all the people who want to second-proof; the other is that a lot of beginners just abandon texts given to them for second-proofing, which holds up the process and is discouraging for others. So a lot of volunteers do their own second-proofing or send their texts to established contacts with a track record of finishing the job, rather than making them available to newbies. The Directors of Production do serve as contact points, and at any given moment may have some texts for proofing, but they can only distribute the texts that have already been made.

With that explanation out of the way, I can better address your question of what you can do.

Second-proofing is an easy way to start, but material isn't just waiting for you. If you want to look for some, post your offer here and wait a week or so. If no takers by then, e-mail Michael and ask if there are any texts available; he may be able to refer you to a Director of Production who has something current. You may not get an e-text immediately, but you will get one. Of course, you can also look here for offers of e-texts ready to proof.
Your other option is to take on a book yourself. In your case, you already have a scanner, so you are equipped to become a producer. You need to find a PD book.
Getting PD books means finding and borrowing or buying them. You can do this through used bookshops, libraries or book sites on the Internet. I mention a few net sites in the FAQ in the link below. I get all my books through them, since they make it easy for me to find the books I want. Prices range from $5 up to (in my case) about $30.
The best advice I can offer here is: pick a book that you _want_ to contribute, and a book you'll enjoy working with--you'll be living with it up close and personal for quite a while.

In March and April of 1999, Pietro created the PG Volunteers' WWWBoard and Greg Newby set up the mailing list gutvol-d, and, for the first time, volunteers who hadn't been introduced to each other by Michael or the Directors could meet online and communicate directly. A few FAQs and HOWTOs were written, covering the basics, the nitty-gritty of producing books. All of this activity made it much easier for people to get involved, and the Project experienced a new influx of interested volunteers. Improved OCR software was also a factor at this time: in response to the commoditization of scanners, there was rapid improvement in the quality of OCR, and better OCR made for easier production of e-texts. More work was shared out in co-operative proofing experiments.

Dawn, and Charles Wakes Up: The Origin of DP

It was in this new, expansive atmosphere, with ideas flooding in from enthusiasts newly energized by the project, that Charles Franks (Charlz) came up with the idea of a web site that would serve to distribute the work of proofing a book among many volunteers. But not only did he think of the concept; he went ahead and did it!

In April 2000, Charlz first requested comments on his idea in a post on the Volunteers' WWWBoard, and by the end of September, the first e-texts were queuing up on the production line.

Morning's First Footfalls: DP's Early Childhood

On October 9th, Charlz wrote:

   Number of pages proofed by date: 

   2nd    6 
   3rd    6 
   4th   20  <-- Newsletter 
   5th   27 
   6th   25 
   7th   29 
   8th   30 
   9th   45!! (and the day ain't over yet) 

(The "Newsletter" is a reference to the site being mentioned in the PG Newsletter on October 4th, 2000).

High Noon: The Slashdot Attack

Distributed Proofreaders, or DP, simply kept growing from there, as Charlz kept scanning and adding more books and features and proofers, and its simple organic growth produced 600 e-texts in two years, but when Charlz asked for more help on Slashdot, a popular technical news site, on November 8th, 2002, the response blew the roof off! The pages per day figure jumped from 1,000 to about 10,000 for a while, then settled down at its current 4,000. 4,000 pages, even given that each page is proofed twice, is a lot of pages. 2,000 produced pages per day is about five full books per day. DP has formed the backbone of PG's production ever since. Whatever the future of DP's production, its effect on shared knowledge and resources, and the communication and community it has built, ensures that Project Gutenberg will never be the same again.

L'Après-Midi d'un Jeune Site: Puberty Kicks In, and DP Matures

Back to the Future: Where will Tomorrow find DP?

In the future in which we believe, Tomorrow will not be able to find DP.

Oh, perhaps in some long-abandoned midden on Titan, some archaeologist may stumble across a neolithic 766 terahertz computer and, looking up the specifications on Project Galaxyberg, which has stored Information on Everything since the Dawn of Time (when the system clocks on all Unix systems had to be reset), manage to momentarily bring it back to life. A cartoon graphic dinosaur appears momentarily, and a message (so quaint that it is written, rather than mental), flashes: url not found.

This is followed by another:

There are 1,255,773,140,455 updates available for Windows. Update now?

And the system dissolves into its component atoms…

Hey: we don’t remember the individual monks; why should the future remember us? As long as they’re still using our product…

In the nearer future, we can predict with more certainty, if less humour.

Book 2: The Many Towers

Being an account of the many different races who rallied together for the creation of a New Age of culture.

Fiat Lux! Volunteers, Open Source, and 'The Cathedral and the Bazaar'

The Usual Suspects



Content Providers

Project Managers



Everyone who has, for the first time, opened the proofing interface will have had a moment’s qualm over their abilities. It’s a tough job; and the new proofer has already had it drilled in that Quality is our sine qua non. How can he be expected to remember all those rules? How can she perform a rigorous, anal-retentive job that flatly rejects all error? Even Riders of the Purple Sage has pitfalls; how to tackle the Bureau of American Ethnology reports? Fear clasps a leaden hand around the Newbie’s heart.

Fortunately, there is another hand--the Mentor. New proofers must learn two essential facts, and the Mentor must drill them home. First, that the work is rigorous and must be done to the highest standards, and second, that DP is an understanding and pleasant community. The Mentor must, then, play the part of a stern but benign teacher (who denies any such role, since DP is fully egalitarian).

Mentors are an unusual species. The ability to read the same basic mistakes over and over, yet reply to each offender as an individual, is not for all. Some beginners may seem to have learned nothing; while there are some errors (dashes come to mind--Lord, how they come to mind!) present in almost every newcomer. Yet a kind and understanding letter, pointing out the oversights, will usually get a thank you from the Newbie. The Mentor accepts a lower, slower, page count by realizing that every page the Newbie does is also, in some measure, the Mentors.



Book 3: The Return that We Bring

Being a celebration of all the benefits we receive from DP, while contributing a widow’s mite back.




Notable Characters and Eccentrics


Charlz wrote:

Actually if I recall correctly it all started with the outages _prior_ to the move to Internet Archive hosting..... seems my 2 and 3 year olds like to 'button push' .... especially on the computers under my desk which is where the 'server' was located at the time..... and right after I left for work... thus extending the downtime icon_wink.gif
Aldarondo started the 'squirrel' talk and they have grown into legend icon_wink.gif
Sorry to torpedo any good DP lore icon_smile.gif

For more details and some smiles peek in Here.

Games and Nonsense




Support Mechanisms

Ongoing Madness: 'Most Amusing Text' / other long-term threads

Most Amusing Text


Never Ending Story

Allies and Enemies: other organizations with whom we share; the 'closure of the commons' and threats to the Public Domain


Annexe 1: Statistics



Some major sources for the contents of this article are:

Additional Information