Confidence in Page Glossary

From DPWiki

Abbreviations, acronyms, and initialisms used

  • CiP - Confidence in Page: Is the cost of proofing this page again likely to exceed the cost of any errors we might otherwise leave in the page.
  • CiR - Confidence in Round: How likely is it that we've found most errors in a full round of a given project? More generally, do we need another round? If so, what kind?
  • pdm - Page Difference Metric: a way of measuring changes from one page to the next. The three we've used so far are "wdiff changes" (wc), "wdiff alterations" (wa), and "realdiff" (r). The rates for these three page difference metrics are wc/p, wa/p, r/p for per page, and wc/w, wa/w, and r/w for per word forms. While the per word forms seem like they should be more accurate, they introduce false patterns having to do with integer ratios. The per page metrics have been easier to work with.
  • pem - Proofer Effectiveness Metric: How effective is a given proofer at finding errors in a page or a particular kind of page? This is generally based on one or more pqm's. It looks like this will probably not be a single number, but a function of a few parameters.
  • r/p - realdiff differences per page
  • r/c - realdiff differences per character (realdiff is a character-oriented metric)
  • wa/p - wdiff alterations per page (altered = changed + inserted + deleted)
  • wa/w - wdiff alterations per word (wdiff is a word-oriented metric) see below for how to calculate this value
  • wc/p - wdiff changes per page
  • wc/w - wdiff changes per word
  • xpqm - X page quality metric: Any metric used to estimate the quality of a proofed page. Quality is a deliberately broad concept. X is a designator for a page difference metric used to calculate the given pqm. There may be more than one pqm derived from a given pdm, so I'll probably start numbering them. Watch for wqpm2.
  • od.X - OCRdiff derived metrics are noted od.X, where X is a description of the metric. E.g. od.all is the sum of all OCRdiff counts.

wdiff_words and wdiff_alterations are defined using the output of the wdiff -s command.

# wdiff -s P1/006.txt P2/006.txt | tail -2

Yields output similar to:

P1/006.txt: 475 words 475 100% common +0+ 0% deleted +0+ 0% changed 
P2/006.txt: >475< words 475 100% common +0+ 0% inserted 0 0% changed

wa/w is calculated by the following formula:

wdiff_alterations / wdiff_words

where:

  • wdiff_alterations is the sum of the number between the +'s in the output above. Alterations are deleted + changed + inserted. The changed value is the one listed for the first page. The wdiff command only lists deleted for the first page and inserted for the second.
  • wdiff_words is the number between the ><'s in the output above.