Round-balancing proposals

From DPWiki
Jump to navigation Jump to search

In late March 2007 there seemed to be an awful lot of DP forum threads suggesting various ways to balance the rounds and solve the problems of our queues.

In December 2007, there continued to be lots of forum posts on this topic. The problem could be described as: some parts of our work flow are doing much more work than other parts, resulting in large backlogs (queues), which cause projects to take much longer to complete, and spend much of that time not even being worked on.

This is a minute-taking page, an attempt to keep track of all the suggestions, and prevent them from getting buried in the Forum. Please add to, clarify and reorganise as you wish. But please only minute-take, that is, list from a neutral point of view the elements of the proposal and the comments for and against that are in the forum threads. Actual discussion should take place in the forum threads themselves.

Second Pass

Proposes allowing books to receive P1 > P1 > P2 and then proceed to F1. Implemented as {P1->P1}.

  • crb11 proposed: "Give P1->P1 projects priority in the P1 and P2 queues. I'd create a Second Pass queue for P1, as in P2, start it at 10 books, and then gradually ramp both queues up to about 100, or however many it needs to be so eventually such books have no queue, or a minimal queue. 100 seems a lot, but I think we do want to have half or more of our books going through twice - the more, the more the queues go down."

Objective criteria for P3 skipping

Mainly concerns diff-count as a criterion for P3 skipping.


Reduce P3 Queue

Forum Threads

forumtopic:25583 and related forumtopic:23149


This proposal for round balancing works with the idea that the diff counts of a project between the various P rounds offer an insight into if a project is a potential candidate to skip P3. The thread is an exploration of the criteria that a project would need to satisfy in order to skip P3.

Diff Count Score

This proposal makes use of a 'diffs score'. This number is simply means of expressing the average number of errors corrected in a round.

As per the example in the thread, a diff count of 9.84 in P2 means that on average, proofers in P2 found one error every 9.84 pages.


Proposal 1: A project is allowed to skip P3 if its Pages/P2diffs score is 7 or more.

  • In Plain English this means that a project would be allowed to skip P3 if in round P2 proofreaders found no more then one error for every 7 pages on average.

Proposal 2: A project is allowed to skip P3 if all of the following are true: Its Pages/P2diffs score is 4 or more; its Pages/P1 score is 0.5 or more; and its P1diffs/P2diffs ratio is 1.5 or more.


Positive Aspects

  • Reduces P3 Queue
  • No PP commitment required

Potential Issues

  • Does the average page count of proofers need to be considered before allowing a project to skip P3?
  • Does the time proofreaders spend on a page impact the quality of their work, and does this need to be considered before allowing a project to skip P3?
  • Do the error rates need to be correlated to page sizes to make them more equitable?
  • Are there too many ‘click through merchants’ operating in P2 that result in inaccuracuate diff counts?

Further Questions

Some of the following suggestions don't specifically relate to the above idea, but have been included as this section is a report on the thread discussion.

  • Should there be testing groups of 50 pages in the same project to gain a better idea if the diff levels are consistent throughout the project before allowing them to skip?
  • Do these criteria only get applied to fiction books or all books?
  • Could a further criteria for allowing a book to skip P3 be obtained from sending a representative samples of its pages through P3 and then deciding its suitability to skip based on the results?
  • Does a method need to be implemented to allow all but some pages (nominated by a PP due to those pages being complex) to skip P3

Actions to date

  • User laurawisewell collated original set of data that set proposal in motion
  • User rfrank has built a P3 skip tool analyse diffs. (Suitability of this tool questioned further on in thread)

Possible Future Actions

  • A tool to judge whether a project has met the criteria to skip
  • Willingness of PPers to Post Process these projects
  • A way to put a project back to round P3 after F rounds at the request of a PPer


This post by JulietS describes the most recent status of the project, as outlined in the thread.

"Just so folks don't think that nothing has come of this discussion, rfrank has built a P3skip tool that looks at both number of diffs and how fact the proofers went through to arrive at a score. He's skipped a bunch of his projects on the basis of this analysis and will be reporting back what he finds.

I've run most of the projects that I have waiting for P3 through his tool and posted the results in his tools thread. I plan to do a little more analysis on the most promising prospects, then advertise for some PPers who are willing to commit to the projects for skipping P3 and then reporting back what they've found. I know tenaj had volunteered for similar duty, but since she is away for a couple of weeks, I'll see if I can find other people. It will take awhile for these projects to get through to PP and for us to get results."

Adaptive sizing of work in rounds


The belief behind the discussion in this thread is that generally speaking the more projects available in a round, the longer it will take projects to progress through the round. This thread explores DPers thoughts on reducing the amount of material available in the rounds.

Proposal Details

Cut the available work (measured in bytes of text) in a round by 50% for a month. And measure the relative speeds of projects. Do this for P1, P2, and F1.

Reactions in Favor

  • tenaj - "If cutting down the amount of work available would help move books through, I would support it. I seriously doubt that proofers would complain about choice so long as there WAS a choice. So it might be a viable thing to do."
  • Janet "I'm in favor of reducing the number of available projects in the rounds. I am so overwhelmed by how many books are there!"

Reactions Against

  • stygiania - "when you take choices away from volunteers, you will lose some volunteers. Having 200 projects available in a round where over 1000 people have worked (at some time) is not exactly unreasonable."

Further Questions/Considerations

  • stygiania - "How would you reduce the available work by 50%? Would you cut genres or languages? No matter what you eliminate, you will offend someone because you are taking away choices."
  • Response by spiegel428 - "I'd say genres. LOTE queues don't necessarily behave the same as English."
  • tenaj - "One other thing which might help movement of books would be to stop having PM individual queues."
  • fvandrog - "As for most things on this site (and live in general) there's an optimal balance. The extremes are of course rather clear: allow one book per round, and it will move fast.... but lots of people won't find the one book available at any given time interesting; allow, say, 2000 books per round and probably most people will find something of their liking to proofread.... but most books will not move with an impressive speed.
    The optimum will lie somewhere in between."
  • crb11 - "Two goals I think we should have:
    - a proofer should be able to go to one of their genres of interest and find something they want to proof.
    By "genre of interest" I mean a genre queue for which they will be happy to proof the bulk of books.
    Implication: each genre queue should release at least three books. This allows for one HARD (or other PPD-monster) and one book which is normal but just something the proofer isn't interested in, without the proofer being blocked from the round."

Possible Actions

  • JulietS "I can certainly reduce the number of books allowed in a round for each English genre for P2 and P3, which would have the effect of reducing the amount of material available overall. There's no point in changing anything in P1 because almost anything in English releases right away unless it is blocked by same-author constraints.

I sincerely doubt that reducing the number of projects in those rounds will increase the number of pages done each day. It seems unlikely that doing the same number of pages across fewer projects would move more projects through, but intuition can be wrong."

Challenging our assumptions about queues and control

  • JesseW proposed: "We need to provide more places for a brand-new newbie to start than just P1. I suspect there are parts of PPing that could be done by a newbie, and maybe formatting, and probably other areas. I hope, by splinting up our incoming newbies into different tracks, we'll hopefully encourage them to not stay in P1 as much, thereby helping the imbalance."

Advertising the forum

Special/birth/other day projects managed by PFs

Request list for P2

Defining goals and measuring progress

Math-verifiable, minimal-code Queue Growth Reduction option

Proposes a system of points that limit how many pages a proofer can do in popular rounds without proofing any in the less popular ones.

  • jrbagley proposed: "On Saturday you post a note in place of P1 that says "Thanks for all your great work! P1 has exceeded your goal for the week, and you guys need to take a rest while P2 catches up! Have a great weekend and check back in on Monday. By the way, anyone interested in working in P2 is welcome to join us." Simple, not offensive, and easy."

Motivating people to work in higher rounds

  • crb11 proposed: "a positive, public and universal statement from the site to the effect that while all volunteering here is good, the site has greater needs in some areas than others, together with the stats/graphs to explain it, plus an explanation of what {P1->P1} means, why we're doing it and why proofing those projects is a good thing" (and here)
  • garweyne proposed: "encourage new proofers to do P1->P1 [by means of] a) advertise these projects more prominently b) put them on top, just under the BEGIN projects. / Currently, the only reason for such projects is to balance the rounds. We should use them for a smoother training and a better selection of proofers with the qualities that we need for P2 and P3."
  • stygiana wrote: "If the high-volume P1 proofers want to stay in P1, restrict them to projects that have been in P1 for 50 days or more." This idea was received with general enthusiasm and at least one "I Agree" icon. The implementation on this would be akin to the limits placed on BEGIN projects. To be determined: what is the threshold for "high-volume" and at what point the restrictions take effect. Fifty days was deemed too long, but people seemed to like twenty-one days, as it mimics the time requirement for P2 access. That left about 35-40 English projects available to the veterans. If the limit were placed at seven days, roughly half of the P1 projects would be available.
  • SMH proposed: "tongue-in-cheek warning) / Wouldn't it be great if certain high-volume P1 proofers were automatically redirected to P2, but with everything looking as though they had landed safely in P1 as usual."

Eliminate P2 queue and stale diffs

Suggests a mixing of current P1 and P2, in which all would be listed on one page, and pages would be available in both states.

Brainstorming: P1 is for experts

Proposes that a book should not be proofed first by beginners as now, but by Experts first, then Beginners, then Excellent proofers, where Beginner/Expert/Excellent correspond roughly to our current P1/P2/P3 proofers.

Interesting. We used to do it that way, about 5 years ago. We had only Rounds R1 and R2, and new people were suggested to start in R2. (I guess so that they would not be overwhelmed by bad OCR, and would see by example the way most proofing & formatting was done.)
But we changed from that, and began to recommend that new people start in R1. I don't remember just why, but it might be good to review that decision before changing it back 01:24, 20 December 2007 (PST)

Proposal: kudos points

Number of pages in each round and queue

Not a proposal so much as a question, but relevant to the other proposals.

My adventures discovering the P2 requirements

Proposes (presumably) that the requirements for access to different rounds be made easier to find, and that out-of-date links be removed.

The simplest useful site statistics you can think of

Asks what simple statistics would persuade a P1 proofer to work in P2.

  • gren proposed: "a good system of automatic diff analysis would be a good thing. What is the statistical average of P2 changes for x book and how does my work in P1 compare to that? The problem is, do you get a big enough sample size with most books and considering indices, tables, and adverts would all work as outliers it could be problematic."


  • Lucy24 proposed: "coding for a backup PM selected in advance by the original Project Manager who would be enabled to step in and do stuff if the PM is gone for some amount of time. It might be the PP, or it might be one specific PF or another PM; for some projects it would most appropriately be some visible and dedicated proofer. The details are most appropriately worked out by Project Managers"
  • EricHutton proposed: "As an option has it been discussed giving PM's more control over when their projects are released into each queue. / One of the disadvantages of the current system is that PM's don't know when a project is going to be released. I am only a "low volume" PM myself, but having something release while I am on holiday/vacation or not having much time to attend to queries because of what ever reason does not seem good. / This would mean that if a PM had to take time out, or became inactive for an unknown period their projects would be left on hold, either until they returned as an active DP person again, or they were allocated to someone else."
  • Njalsson proposed: "The site rewards people for the volume of their work, by giving them a rank based on it. / What would happen if we stopped doing publishing their rank/pages done?"
Probably 1/3 of our volunteers would go away.
Nearly everyone who volunteers wants to get some feeling of accomplishment. For many, this is shown in the continued increase of their own page counts, which is much more personal than the general count of 'books added to PG'. When we made the mistake of going from R1-R2 to P1-P2-F1-F2 and dropping everyone's count to zero, we got a whole lot of complaints (and took a hit from volunteers who left). I really think we should avoid making that same mistake again! 01:43, 20 December 2007 (PST)
  • crb11 proposed: "encourage people to apply for F2 and keep the new FFF system going. One thing we may need to consider is whether we can afford to lower F2 application standards slightly: if we were to allow 2 "unforgiveable" errors, that could mean 1 error in 75 pages in the final book. Would this be acceptable given its other benefits?"
  • crb11 proposed: "Create a set of "normative transitions" to use P1, P2 and P3 more effectively, thinking of P1 as "inexperienced OR clearing-out-the-rubbish/type-in specialists", P2 as "experienced" and P3 as "expert". / [ Details snipped, see post linked above.] / I would expect that PMs would be asked to classify their book when it starts off in the rounds (like the EASY/AVERAGE/HARD rating at the minute) but there would be the option of changing it in the light of what happened in P1. As for E/A/H at present there would be light policing to ensure that PMs weren't abusing the system. / It's all workable without software changes (although I imagine some relatively minor changes would help), just a bit of PF time to move things between queues a bit. The "class number"can go in the project comments for now. Edit: I'm told PFs can't do this, so it would have to be squirrel/DP-req time. / Doing all of the above should help balance the P* rounds and use their skills more effectively. It also means that we can tinker with the balance a bit if we need to: for instance if P2 is getting ahead of P1, we could put some of the class 2 books through P1-P2-P2 instead of P1-P1-P2."

See also