WordCheck/Project Management

From DPWiki

Accessing the WordCheck PM interface

There are several different ways to access the WordCheck PM interface:

  • You can access the Edit project word lists link from the Project page. The link is to the right of the Edit the above information link.
  • You can also access the Edit Project Word Lists page from the Edit Project page, see the Edit project word lists link at the bottom.
  • The results of a project search will include a Word Lists link under the Options column. If the project also has Suggestions from Proofers, the exclamation.gif icon will appear and if clicked will take you to the Suggestions from Proofers tool.
  • To see the proofer suggestions for all projects for which you are the PM, use the Manage proofreaders' Suggestions link from the Project Manager page.

WordCheck PM tools

The Edit Project Word Lists page has the following links along with their visibility, purposes, and word list target (GWL = Good Words List, BWL = Bad Words List):

  • Show details for ad hoc words
    • Visible: after OCR text is loaded (or after first page is proofed for type-in projects)
    • Purpose: to obtain word frequencies for any given set of words and add one or more of the words to either the Good or Bad Words Lists
    • Word list target: BWL and GWL
  • Show WordCheck flagged word statistics
    • Visible: after OCR text is loaded (or after first page is proofed for type-in projects)
    • Purpose: to see how many words would be flagged per page based on the current Good and Bad Words Lists
  • Show WordCheck proofer interface usage
    • Visible: after the first page is proofed in the project
    • Purpose: to see which pages have had WordCheck used on them in each round
  • Words that WordCheck would currently flag
    • Visible: after OCR text is loaded (or after first page is proofed for type-in projects)
    • Purpose: to see a list of words that WordCheck would currently flag and allow adding the flagged words to the Good Words List
    • Word list target: GWL
  • Suggestions from proofers
    • Visible: after the first proofer suggestion is made
    • Purpose: to see a list of words that Proofers have suggested and allow adding the words to the Good Words List
    • Word list target: GWL
  • Words in the Site's Possible bad words file
    • Visible: if there are Possible Bad Word suggestions for the project's language(s). See the WordCheck FAQ for a list of current languages which have Possible Bad Word suggestions.
    • Purpose: to see the list of Possible Bad Words that are found in the project and allow adding the words to the Bad Words List
    • Word list target: BWL
  • Suggestions from diff analysis
    • Visible: after the first page is proofed in the project
    • Purpose: to see the list of words that have changed between the OCR and the latest version of the text that WordCheck would have otherwise not flagged and add these words to the Bad Words List (these words would then be scannos)
    • Word list target: BWL

More information and sample screenshots of what a PM will see is available on the WordCheck/Project Management/What PMs will see page.


Best Practices

The following list has been gathered from a variety of sources (developers, PMs, proofers) on some best practices for maintaining Word Lists.

Word List Contents

Diacritics

It isn't uncommon for words with diacritics to be added to the Good Words List. In these cases consider adding the same word without diacritics to the Bad Words List so that it will be flagged for the proofers. This holds true even if the word without the diacritic would already be flagged by WordCheck to ensure that proofers don't overlook or try to suggest the word. The Ad hoc WordCheck PM tool makes it easy to see if any of the variants currently exist in the project. Even if they don't exist now, you may consider adding them anyway to ensure that they are still flagged should some proofer change a word to a non-correct form.

Examples:

  • If mediæval is added to the GWL, add mediaeval to the BWL.
  • If João and João's is on the GWL, add Joao and Joao's to the BWL.
  • If Bacalhôa is on the GWL and has been misscanned as Bacalhóa in some places, add Bacalhóa and Bacalhoa to the BWL.

Word List Maintenance

  • Periodicals - Because periodicals may only have a few pages of a specific topic before going into something else, it may provide a better proofer experience if more time is spent creating these word lists ahead of time rather than relying on proofer suggestions (because by the time the PM can add the proofer suggestions the section containing those words may have already been proofed). This is particularly important for projects that have a high number of flagged words such as scientific periodicals (because few of the words may be in the dictionary) and projects that were stuck in the P3 queue before WordCheck was rolled out (because no suggestions were provided from P1 or P2).
  • Sometimes, when I'm creating my original "good words" list, I'll stumble upon a wrong word that occurs so often I'll go back to the original text, find and replace it there, and then reload the text. If it only occurs a once or twice, I'll add it to the bad words list, to make sure it gets noticed, and a proofer can't suggest it as good later. - DancingFool
  • After a project has been proofed for a while, it's worth going back to the "Words that WordCheck would currently flag" list and see if anything new has showed up that should be a good word. A lot of proofers will fix a word but not suggest it, so you don't see them all on that list. I try and do this at the end of P1, at least. - DancingFool
  • When doing the initial Words Lists creation, consider using the Show WordCheck flagged word statistics tool. It is a great way to see information on how many words will get flagged per page using the current lists. If the number is high consider spending more time on the initial creation to lower the number of false-positives that proofers will experience.
  • Similarly, if, after P1, WordCheck is still flagging a large number of words, consider spending more time beefing up the Good Words List either via proofer suggestions or the Words that WordCheck would currently flag tool.


WordCheck PM FAQs

What is a PM required to do?

Per Distributed Proofreaders's Site Administrators, the minimum a Project Manager should create a Good Words List for all active projects, except for in those few cases where a Good Words list would not be appropriate[1].

Once the page files for a project have been loaded, its initial word lists can be created. The steps to create the initial Good Word List for a project are as follows.

  1. Go to the Project page and click Edit project word lists.
  2. Click on the Words that WordCheck would currently flag -- Display link.
  3. Based on your knowledge of the project, and as simple or as sophisticated an analysis as you want to conduct, determine which words should go on the project's Good Words List. The context link can be used to see the word in context as well as the page image.
  4. Select the checkbox next to the words you wish to add to the Good Words List.
  5. After selecting the desired words, click Add selected words to Good Words List.

Clicking the button adds the words to the list immediately -- there is no need to "Save" your changes after going to the Edit Project Word Lists page (doing so will bring up an error as it thinks you want to overwrite your changes).

How would a PM create a Bad Words List for projects?

The mechanics of creating a Bad Words List for a project is basically identical to the steps for creating a Good Words List laid out above. The built-in suggestion tools associated with the Bad Words List are entitled Words in the Site's Possible bad words file and Suggestions from diff analysis. Ideas for project Bad Words can also come from looking at the projects diffs, or even from words suggested for, but rejected from inclusion in, the project's Good Words List. The Show details for ad hoc words tool is useful for getting the frequency and see the context of specific words before adding them to the Bad Words List.

How does a PM decide what's a Good or Bad Word in a project?

You may or may not like this answer, but the system has been intentionally designed to allow PMs a great degree of latitude and flexibility in deciding how to decide what specific words to put on the project's word lists.

At its core, the basic analysis is as follows:

  • Deciding whether to place a word on the Good Words List can be analyzed as "do I want this word to never be flagged by WordCheck for proofers?"[2]
  • Deciding whether to place a word on the Bad Words List can be analyzed as "do I want this word to always be flagged by WordCheck for proofers?"

If your answer is "yes," the word goes on the appropriate list. If the answer is "no," "maybe," or "I don't know," the word should stay off the list.

Different PMs will use different frequency cutoff settings in filtering the suggested results. Some PMs will use a "take time now to save time later" approach, and others are will use a "take a bit less time now and be quite willing to spend a bit more time later" approach.

Some methods, techniques, tools, etc. individual PMs have reported as using include:

  • Using guiprep to help look up/verify suggested words in page files.
  • Accept suggested words at the nth frequency level which PM knows to be good, and then rely on proofers' suggestions for subsequent additions to the Good Word List

What does a PM have to do with WordCheck after the initial Word List(s) have been created?

Throughout the proofreading rounds, the PM should periodically check to see what words the proofreaders have suggested, and update the GWL and BWL accordingly.

See also the Best Practices section.

Footnotes

1

In the rare case where no flagged words are correct, or where there is no dictionary for the project language(s) then it will be appropriate to leave the Good Words list empty--but only in such rare cases. When this is the case, please leave a note in the Project Comments that the Good Words list has been left blank intentionally.

2

The Bad Word List "trumps" the Good Word List. Thus, if a word happens to end up on both the project's Good Word List and Bad Word List, the word will be flagged for proofers.