Roundless System

From DPWiki
Jump to navigation Jump to search

This is a high-level description of a someday-in-the-future Roundless System of operation for DP. The origin for this idea was Charlz's Vision Paper of July 4, 2003.

Note that this is just a plan at this point. Much of it is still tentative; just ideas of how it would work. Nearly all of the details are undetermined at this time.


Key Features

  • Work progresses as individual pages, not as a whole-book Project. Thus instead of the whole project moving through rounds in sequence, individual pages proceed thru different stages at their own pace, and following their own path. The current rounds are replaced by various tasks that a page might go thru.
  • Each page proceeds thru the system until it is 'done'. Easy pages may be proofed only 1 or 2 times; Hard pages may go thru proofing many, many times. 'Done' might be decided either by some objective, programmable criteria (ratio of corrections made since the previous proofing, etc.) or possibly as simple as the proofer marking that this page is 'done'.
  • Each page is individually routed to various 'specialist' proofers/formatters, but only if that page needs that kind of processing. A page with tables would be routed to the tables processors; a page with indexes would be routed to the indexes team, a page with Greek would be routed to the Greek team, a page with font changes (bold, italic, blockquotes) would be routed to the formatters (and a page without such changes (like many easy novels) could skip the formatting process altogether). A simple, text-only page might only go through two proofings, 0 formattings, and be 'done'.
  • Pages would be tagged to indicate which 'specialist' processing they need; then the system would automatically route them to the people who do that special processing. This tagging might be done in a meta-data scan before proofing (either by automated tools, or a review by experienced proofers, or both), or it might be done just by having the P1 proofer click on checkboxes saying that this page "contains tables", "contains indexes", "contains Greek", "contains font changes", etc.
  • When all the pages are done, they are gathered together as a project and go through post-processing much as they do today.

Benefits

The main benefits anticipated in this Roundless system are a more efficient, concentrated proofing, and less wasteful, unneeded proofing.

  • pages get as much or as little proofing as they need to get 'done'.
  • pages with 'special' features go to experienced specialist proofers/formatters.
  • pages without 'special' features don't have time wasted in specialized proofing/formatting rounds.
  • volunteers don't waste time on pages that don't need work. For example, formatters doing font changes (bold, italic, etc.) don't get a page unless there are some font changes on the page.
  • encourages CPs to massage the text pre-round: the better the text, the more chances of it doing less rounds
  • no more need to split big volumes in multiple projects, less work for the PP to reassemble them

Some Details

Specialist teams

These are not really 'teams', but just the term for a special group of volunteers who are qualified for a special proofing task, and have volunteered to do this. When such a volunteer logs in, the system presents them with a choice of tasks ('rounds') to work on -- not only the current P1/P2/P3 and F1/F2 tasks, but also any specialist tasks this volunteer is qualified for. So, for example, a volunteer qualified for Indexing logs on & selects the Indexing task, they are shown a list of projects that have Index pages waiting to be proofed/formatted. The display of available tasks might include not only all the tasks this volunteer is qualified for, but also indications of the backlog of work in each task, thus encouraging people to work on the most needed tasks (see Round imbalance, Round-balancing proposals).

How are people selected for specialist teams? This hasn't really been decided, but the two main options seem to be:

  • Restricted Qualification similar to P2, P3, F1, F2 now: the person must have proofed x number of pages, have been on-site for y days, and pass a quiz or diff evaluation, or
  • Open Enrollment: anybody who is past the P1 Begin proofer stage can join any specialist team that they are interested in. Our current teams (Index Junkies, Table Turners, LaTexers, etc.) work this way, and they seem to get by fine with only peer pressure and mentoring of new members to keep acceptable quality standards. (Given the DP history of openness and volunteerism, I think we'll lean toward this option, possibly with just a bit of qualification.)

Tasks (replaces 'Rounds')

These are some of the suggested ones:

  • P - proofing -- just like now, the basic compare-text-to-scan task.
  • F - font changes -- does the most common tasks of the current formatting rounds -- dealing with font changes (italics, bold, g e s p e r r t, SmallCaps, blockquotes, etc.).
  • I - indexing -- work on pages containing indexes.
  • L - lists -- work on pages containing lists.
  • T - tables -- work on pages containing tables.
  • C - calculations -- work on pages with calculation (mathematical or chemical) typesetting (often using LaTex).
  • M - music -- work on pages containing music.
  • N - notes -- work on pages with footnotes, sidenotes, or endnotes.
  • O - other language -- any 'other' language used on the page that is different than the language used in most of the book. For example, Greek in a mostly-English book, or Latin in a mostly-German book, etc.
  • D - drama/poetry -- work on pages containing drama or poetry.
  • G - graphics/illustrations -- work on pages containing graphics or illustrations. (This might be a post-processing task.)
  • H - headings/spacing -- work on pages containing chapter headings, section headings, etc. and also on pages needing special spacing of lines.
  • probably some others, too.

Each specific page might make multiple passes through any of these tasks. (Repeat until complete!). They would not be specifically labeled as 'rounds', like Tables1, Tables2, etc. but rather Tables pass 1, Tables pass 2, etc. (They would still be listed a T1, T2, etc. in places like the 'My Projects' page.)

Individualized Routing

There has been some discussion of the system keeping an Average Accuracy score for each proofer, automatically-generated based on their past proofing. (Exactly how isn't clear -- probably mostly from how many changes the next proofer makes to their work.) Probably a separate score for each of the above tasks that the proofer is qualified for.

This would not be published (except possibly to the proofer him/herself), but would be used by the system to control the routing of pages. For example, the system could make sure that the P2 pass on the page is done by a more experienced proofer (one with a higher AA score) than the proofer who did the P1 proofing pass. Or the system could use this AA score in determining how much more proofing the page needs: this page needs 2 more proofing passes by proofers with AA scores in the 100's, or 1 more proofing pass by a proofer with a score above 200.

When is a page done?

Since each process will repeat until complete, we need to have a way of deciding when a page is done. Just how to do that hasn't been decided.

One simple option is just to allow the proofer to check a box saying this page is done. Possibly only experienced proofers, or those with an AA score above some number would be able to do this. (But if that is a small number of people, this could become a real bottleneck in the system!) Perhaps a cumulative total: a page is done if 1 proofer with a high AA score says so, or 2 proofers with medium AA scores, or 4 proofers with low AA scores.

The other option is an automatic determination by the system that a page is done. There seem to be 2 main possibilities for this: Confidence in Page (CiPg) or Confidence in Proofers (CiPf).

Confidence in Page is a judgment of how good this page is, mainly based on how many changes were made from the previous version. Factors involved are how many possible errors were there (total page size, number of words, word length, percentage of known good word (in our dictionary), etc.), number of errors found in previous passes, and number of errors found in this pass. Starting parameters to quantify the 'difficulty' of the page might also be useful, but unless they can be automatically generated for each page, the burden on the Content Provider to provide such parameters for every page may be unwieldy. (See Confidence in Page analysis for more details about this.)

Confidence in Proofers is a judgment of how good the proofers on this page were, based on their individual Average Accuracy scores (as mentioned above). So a page is judged done when the total AA score of all the proofers who proofed it exceeds a target score. For example, if the target score is 300 AA points, the page would need 5-6 passes by proofers with AA scores in the 50's-60's, or 3 passes by proofers with AA scores in the 100's, or just 2 passes by proofers with AA scores in the 200's.

Possibly we will use some combination of both of these. Errors found by proofers with higher AA scores will count more toward 'done-ness' of the page.

See Also