User:Jmdyck/WordChecker pre-release
From DPWiki
Forum topics
- Possible spell-check changes; which would you use?
- Discussion of 'Additions to the spellcheck dictionary'
- cpeel's announcement
- code moves to test server
- "Rumor has it that the last item preventing the new spellcheck code from going live is the lack of a FAQ explaining what the heck is going on to Proofers and Project Managers alike."
- "I've been going through this topic looking for possible hanging threads"
- RFC: new standard interface
- RFC: new spell-checker
- RFC: new features of the project edit page
- Proposed Site-Level Bad Word Lists for new WordChecker
- Proposed Site-Level Good Word Lists for new WordChecker
- Preprocessing tool theories (split from "Additions...")
- Preamble for "Proposed Site-Level Bad Word Lists"
- New word check tool requires PM action (in 'Providing Content' forum)
- New word check tool requires PM action (in 'Managing Projects' forum)
(Open) Issues
indicates an issue that has been resolved
indicates an issue currently available for testing (usually in a sandbox)
indicates an issue that should/might be fixed/implemented before the release
PM Home Page
- forumpost:278079 kraester: maybe links to site-level word lists from the PM "home" page
Adding pages
- forumpost:242815 donovan: After a project is loaded, along with the "Load Completed" message, present the option to generate a project-specific wordlist. (See also forumpost:281582 JulietS.)
- forumpost:281117 bowerbird: when a project is loaded, automatically create a project-level bad words list containing every word not passing spellcheck
- forumpost:281340 bowerbird: auto-create project-level good words list containing every word with more than 5 occurrences
Project Page (as seen by PM)
forumpost:276239 cpeel: Move links for "suggestions" files from "Extra Files" to "Word Lists"? (jmdyck)
- forumpost:278977 DESiegel60: provide an easy interface for comparing OCR text vs proofed text on the project page as seen by the PM.
- forumpost:281147 DESiegel60: a warning/confirmation prompt if a PM tries to make a project available with an empty good-words list (or forumpost:281604 garweyne: maybe disallow it. (See forumpost:281896.))
- forumpost:277675 dkretz: know which corrections were made in the spellchecker vs. proofing
- forumpost:286098 kraester: good_words_suggestion.txt: put it in "round, then page number" order? (see forumpost:286195 cpeel)
- forumpost:289805 kraester: good_words_suggestion.txt: move link from "Word Lists" back down to "Extra Files"
Project Page (as seen by all)
forumpost:278977 DESiegel60: show the date when the project-specific lists were last updated (forumpost:283404 cpeel)
forumpost:278726 JLohnert-PG: Good word list should be alphabetized. (forumpost:283404 cpeel)
forumpost:284150 kraester: Word Lists should open in new window/tab
forumpost:280890 kraester: treat the Word List and "displayed Project Page" as separate "entities" with their own update statistics (half-way: you can update non-word stuff without bumping the word timestamps, but you can't update the word lists without bumping the "project info" timestamp.) (forumpost:287523 which resolves it as far as kraester is concerned)
- forumpost:278973 dkretz: a box somewhere that marks when the PM last reviewed the suggestions list.
- forumpost:284150 kraester: when opening Word Lists, force a "cache purge" so don't need to Reload/Refresh?
- forumpost:284865 garweyne: in Words Lists section, include links to applicable site word lists
Project Editor
forumpost:281696 jmdyck: wording of the links to "suggestions" scripts (cpeel: committed consensus)
forumpost:283021 kraester: add an indication that links will open in separate windows/tabs (forumpost:283078 cpeel)
forumpost:282899 big_bill: link to WordCheck FAQ near WordCheck-related widgets (forumpost:283078 cpeel)
forumpost:284195 kraester: have the link to the WordCheck FAQ open in a new window too?
forumpost:283021 kraester: show "Suggestions from proofers" link only if there are suggestions. (forumpost:284176 jmdyck)
forumpost:278357 kraester: "clone project" should copy project-level word lists
forumpost:289108 garweyne: BUG: frequencies not removed from project Bad Words input (forumpost:289175 cpeel)
forumpost:289108 garweyne: in wordlists, a leading space inhibits recognition of a good word (forumpost:289175 cpeel)
forumpost:288651 kraester: suggestion-links layout
forumpost:289680 DESiegel60: introduce some vertical whitespace
- forumpost:278089 dkretz: links to site-level word lists near the word list boxes in project editor
- forumpost:289085 garweyne: more flexible format for wordlist files (forumpost:289108 including a "terminator character") (forumpost:289175 cpeel)
- forumpost:245026 garweyne: Allow PM to upload a word list file (as an alternative to pasting text into a textarea).
- forumpost:275968 dkretz: sharing wordlists between a set of projects.
- forumpost:281662 garweyne: two tools for word-list management (Also forumpost:282963.)
- forumpost:281815 garweyne: visual tweaks to editproject UI (Also forumpost:283021 kraester.)
- forumpost:282667 dkretz: put word-list boxes side-by-side
- forumpost:282751 dkretz: various reactions of a "Naive Project Manager"
- forumpost:282899 big_bill: change labels or add extra ones as bracketed subtitles?
- forumpost:282919 t-bonham@scc.net: have two different levels of documentation on this screen
- forumpost:282963 garweyne: separate, at least visually, the word management parts from the project edit / comment edit page(s).
- forumpost:284211 garweyne: in the Good/Bad list box, if you enter something that isn't a word (as WordCheck defines it), issue a warning.
- forumpost:285739 garweyne: modify the choice of languages
- forumpost:289760 t-bonham@scc.net: "clone project" should also copy good_word_suggestions.txt (others disagree)
"Suggestion" scripts
forumpost:281383 garweyne: Modify set of predefined cutoffs. (cpeel: done)
forumpost:281383 garweyne: Default cutoff is 10, maybe should be different. (cpeel: Changed to 5.)
forumpost:281649 lvl: Within a given frequency, perhaps sort words alphabetically? (cpeel: done)
forumpost:275795 garweyne: The "download" link, if you "save as", the suggested name is generate_dict_suggestions.php; it would be very useful to have a project-specific name, like projectIDabcdxxxxxxxx_suggestions.txt
forumpost:282963 garweyne: display words in DPCustomMono2, or in monospaced at least. (forumpost:283078 cpeel)
forumpost:283003 lvl: downloading the "candidate words from project" list creates a file with wrong end-of-lines on a windows platform (forumpost:283078 cpeel)
forumpost:282465 kraester: add a "Show non-flagged words in project which are on DP's likely-stealth-scanno list" tool (forumpost:283023 kraester's half show-stopper) (forumpost:283098 cpeel)
forumpost:283138 garweyne: allow to invoke scripts with a minFreq argument directly from the edit page (to save regen time if you want non-default) (forumpost:283303 cpeel: re-display via javascript, so no regen)
forumpost:284528 kraester: cutoff strangeness
forumpost:283138 garweyne: allow to download text version without going through the HTML version.
forumpost:287492 kraester: preamble wording for displays
forumpost:287157 DESiegel60: insert confirmation screen before download
forumpost:286994 garweyne: add project name to display
forumpost:287523 kraester: should use case-insensitive sort (forumpost:288641 kraester: still not quite)
forumpost:288641 kraester: put project name in <h2> or <h3> (forumpost:289614 cpeel)
forumpost:286234 garweyne: add preamble to download text?
forumpost:288651 kraester: preamble wording for downloads
- forumpost:286143 kraester: (future) allow to sort alphabetically
- forumpost:290492 garweyne: include word- and occurrence-counts for words not shown (below cutoff), and/or count of words shown at each cutoff (also forumpost:291067)
- forumpost:291084 garweyne: remove all the lines before the list, except the title, the links and the list counts. (forumpost:291615 kraester)
words that WordChecker would flag
forumpost:284909 dkretz: it's slow (Also forumpost:284937, forumpost:284963 garweyne)
forumpost:284952 garweyne: show time spent computing results
- forumpost:280655 bowerbird: Allow lexico sort, and various kinds of grouping. (Also forumpost:281680.)
- forumpost:281649 lvl: uses pages returned to the round, which could virtually contain anything
- forumpost:284937 garweyne: for display, use same format as download, i.e., pasteable into word-list-entry boxes
- forumpost:288608 garweyne: give the reason why a word is included (not urgent) (also forumpost:291065)
- forumpost:291042 cpeel: suppress words that are already on the project's Bad Words list?
words suggested by proofers
forumpost:284543 kraester: include word's frequency in project?
forumpost:286434 kraester: BUG: serious problems in download (and forumpost:287394)
forumpost:281844 kraester: the words are not checked against the current Project Good Words List in order to filter out words that the PM has already added to the list.
forumpost:286437 kraester: BUG: "F[u=]lford" appears in file and download, but not display (forumpost:287597 cpeel: not quite what's happening)
forumpost:284543 kraester: BUG: words in the file do not appear in display (not a bug)
forumpost:286098 kraester: make the "suggestion counts" be counts of how many times the word was actually "suggested" rather than the number of times it appeared on the page (forumpost:289175 cpeel)
forumpost:289670 DESiegel60: extra column warrants explanation in preamble
- forumpost:281649 lvl: having a word in the bad words list does not prevent it to be suggested in the "Suggestions from Proofers", which is rather silly
- forumpost:287394 kraester: (future) if only one round of results, it and "All Rounds" are same, so don't show both
- forumpost:281844 kraester: there's no way (for anyone other than SA) to remove words from the good_word_suggestions.txt file
- forumpost:281919 DESiegel60: allow PM to clear the suggestion list.
- forumpost:282052 garweyne: allow PM to edit suggestions file.
- forumpost:282180 t-bonham@scc.net: Include code that will automatically add the suggested word to the good word list if the same word is suggested x times by y different proofers. (Also forumpost:289034 SallyPursell)
- forumpost:290018 DESiegel60: for each suggestion, include a list of the pages on which the suggestion was made.
words that are in the site possible bad words file
forumpost:285782 kraester: why get '0' and '1'?
- forumpost:284944 garweyne: disable
- forumpost:286110 kraester: (future) develop some sort of scheme to give a "rating" of sorts to these observed stealth scannos and incorporate that into the results
Proofing Interface
- forumpost:280423 caw: At some point (after roll-out), have the on-site checker programmatically enforced
WordChecker Pane
forumpost:280741 kraester: tweak appearance of "Submit Corrections" and "Quit Spellcheck" buttons (kraester: withdrawn due to confirmation-on-quit-with-unsaved-changes)
forumpost:277078 garweyne: confirmation dialog if I try to quit when there are unsaved changes
forumpost:277415 kraester: the AW button does not get grayed out until after I click on something outside the word-editing box (See also forumpost:281146, forumpost:281923.)
forumpost:277799 garweyne: If you enter a space at the end of a word, this space is introduced in the page. Instead, right-trim the spaces in the input window.
forumpost:280472 lvl: the links are not shown as links (i.e., not underlined) (See also forumpost:281146.)
forumpost:282133 kraester: add just a bit of padding between the right edge of the textbox and the left edge of the Unflag+Suggest button
various: finalize tooltip for Unflag+Suggest button.
forumpost:282299 garweyne: if neither the primary nor the secondary language has a dictionary, say "No check against a dictionary has been made"
forumpost:282447 cpeel: move the two buttons and the Resize buttons to the far left of the dialog instead of centered (forumpost:283078 cpeel) (forumpost:284195 kraester)
forumpost:277220 dkretz: After applying Unflag+Suggest button, make result distinguishable from the rest of the text.
? forumpost:281451 lvl: BUG: clicking on the Unflag+Suggest button for "aaÞaa" on last line replaces "aa'aa" on the previous line with "aaÞaa" (See also forumpost:281543, forumpost:281560.)
? forumpost:282133 kraester: even though the Unflag+Suggest button is grayed out, it is not deactivated. (forumpost:282384 fixed?)
forumpost:287502 kraester: if it's the only word flagged on the page, Unflag+Suggest doesn't work (forumpost:287597 cpeel)
forumpost:287810 kraester: BUG: confirmation dialog re unsaved changes only appears if you make changes in boxes having the Unflag All & Accept button.
forumpost:281619 txwikinger: BUG: clicking in WordChecker UI causes browser to go into infinite loop. (forumpost:281642 fixed?) (no repro, so we'll say it's resolved)
- forumpost:281146 lvl: when clicking on a lone button in the line, the entire text moves in the window, because the total height of the window has changed
- forumpost:242953 big_bill: Allow visual differentiation of different classes of bad words. Allow user to temporarily "switch off" some classes (to de-clutter the interface). (See also forumpost:278053 dkretz: ask the user to only be dealing with one check-type at a time.) (And forumpost:279024 DESiegel60: a page from which any of several specialized check dialogs would open.)
- forumpost:242972 De2164: Combine spell check page with normal edit box.
- forumpost:243980 big_bill: Dynamically control the quantity of stealth scannos that are flagged.
- forumpost:264377 garweyne: use alternative aspell dictionaries (e.g. en_uk vs. en_us) for a project/a page.
- forumpost:276983 JLohnert-PG: 'undo' for Unflag+Suggest button.
- forumpost:276983 JLohnert-PG: variable line spacing (keeping the same font size).
- forumpost:276983 JLohnert-PG: maybe the ability to "turn off" all text which has not been flagged.
- forumpost:277055 garweyne: An 'Unflag just this occurrence' button
- forumpost:277220 dkretz: "Are You Sure?" confirmation dialog for Unflag+Suggest button
- forumpost:277220 dkretz: Make Unflag+Suggest button operate on attached occurrence only, have a button on the control panel to apply most recent Unflag to other occurrences on page.
- forumpost:277355 dkretz: users would be most comfortable if they can positively indicate that they have dealt with each item.
- forumpost:277799 garweyne: maybe add an edit window for word clusters with an intermediate punctuation without space
- forumpost:277878 bowerbird: have the "potential stealth scannos" flagged with a different color than "definitely not in the dictionary or good-list" words...
- forumpost:277881 garweyne: more kinds of checks (involving punctuation)
- forumpost:278030 JLohnert-PG: have a drop-down/pop-up menu which lists all flagged words (on the page)
- forumpost:278204 garweyne: flag spacey quotes (requires making spaces in edit box visible with a color background)
- forumpost:278381 jmdyck: separate the two aspects of the "Unflag+Suggest" button?
- forumpost:280920 t-bonham@scc.net: 'Save and Get Next' right from the WordCheck screen
- forumpost:282946 kraester: drop-down character lists don't insert the selected accented letter directly into the current WordCheck editing box.
- forumpost:289143 SallyPursell: popup that asks "would it help you to have the following words in the spell-check?"
- forumpost:289764 DESiegel60: resize image buttons should be in the button pane (= toolbox) (forumpost:289772 garweyne)
WordChecker Engine
forumpost:278784 garweyne: support DP's notations for the oe ligature and other diacritics.
forumpost:283003 lvl: Allow non-ascii letters as base character for []-diacriticals.
forumpost:282133 kraester: odd treatment of words containing apostrophes (forumpost:283023 kraester's #1 show-stopper) (forumpost:283975 garweyne)
forumpost:284458 jmdyck: when loading a word list, trim whitespace
forumpost:281451 lvl: "aa×aa" is not flagged, even though a single "aa" is flagged (forumpost:284482 jmdyck)
- forumpost:276747 kraester: For the pattern check that detects words with both letters and digits, add "2d", "3d" etc. as exceptions. (Also forumpost:277021)
- forumpost:280977 garweyne: add a []-notation for the stroke that appears in some letters (Also forumpost:283356)
- forumpost:243705 garweyne: Define "word letters" via character attributes.
- forumpost:281773 garweyne: if there is no aspell dictionary, it would be better to consider everything bad at world-level (also forumpost:285739 and forumpost:292412)
- forumpost:283106 dkretz: include arbitrary regexes in the word lists?
- forumpost:283286 jmdyck: for the purposes of carving the text into words, should we have a looser pattern for []-isms?
- forumpost:289098 kraester: allow phrases with spaces (e.g. "Prairie du Chien") as project Good Words
- forumpost:289239 garweyne: we do not support language variants
- forumpost:289621 DESiegel60: for site-level patterns, files rather than hardcode
FAQ
forumpost:276597 kraester: Change "Bad" to "Flagged" where appropriate.
forumpost:282473 jmdyck: change "Flag words" to "Flagged words"
forumpost:283404 cpeel: let PMs know when the site word lists have changed
forumpost:282499 kraester: Don't shorten "Unflag All & Suggest Word" further than "Unflag All." (forumpost:283422 cpeel)
forumpost:284200 kraester: tell PMs to not use any punctuation except apostrophes in their Project Good and Bad Word Lists
forumpost:284150 kraester: mention that uppercase and lowercase sort separately in Good & Bad word lists. (resolved by using a case-folding sort instead)
forumpost:284546 kraester: "Should I run WordCheck before or after I "manually" proof a page?" section
forumpost:284546 kraester: WordCheck flags *potential/possible/common* stealth scannos
forumpost:284550 kraester: what languages have external dictionaries?
forumpost:284195 kraester: re "show words that would be flagged", mention that the initial generation time is roughly proportional to the number of pages in the project? (forumpost:287523 resolved as far as kraester is concerned)
forumpost:281451 lvl: document that words containing letters and digits are considered as words but do not get an AW button. (forumpost:289614 cpeel)
forumpost:288608 garweyne: add explanation re flagging "do" in "english with french" (forumpost:288621 kraester: wording) (forumpost:289614 cpeel)
forumpost:288621 kraester: fix "misspelled" (forumpost:289614 cpeel)
forumpost:289659 DESiegel60: clarify "Good word suggestions" link from the project page, vs. "Suggestions from proofers" links from the project editor (and forumpost:289778)
- forumpost:287574 dkretz: advise what the expectation is about changing timestamps/lists
- forumpost:289098 kraester: move description of "auto-removal of frequency counts" feature from preambles to FAQ?
misc
forumpost:280902 jmdyck: add a timestamp for changes to just the project comments field, and use *that* (rather than the "saved from editor" timestamp) to trigger the "Things have changed, please scroll down" blurb.
forumpost:288776 JulietS: project zip for PPer should include project-level word lists (forumpost:289807 jmdyck)
- forumpost:278002 big_bill: perhaps alter the "public" names of the "bad" and "good" word lists to, say, "flag" and "don't flag" lists.
- forumpost:289566 DESiegel60: error message: "page has changed state"
- forumpost:242883 garweyne: Detect which words were changed in P1, as suggestions for Bad Words list.
- forumpost:276978 garweyne: (preference to) proof in horizontal and spell-check in vertical.
- forumpost:277161 garweyne: Allow PM to spell-check a page without proofing it.
- forumpost:277363 dkretz: add the spellchecker as a Round of its own.
- forumpost:277394 dkretz: "turn off" spellchecker if PM (or someone) has done a full check beforehand?
- forumpost:278726 JLohnert-PG: feedback (to proofers) re suggested good words -- Rejects should have a reason.
- forumpost:281725 t-bonham@scc.net: include 'having a project good word list' as a factor in the queue release algorithm. (See forumpost:281896.)
- forumpost:282299 garweyne: choose a "dummy" language, and define an empty aspell dictionary.
Possibilities for roll-out
prep
- forumpost:281373 lvl: before rollout, freeze an implementation, and start a new thread for testing of ONE implementation and commenting on ONE FAQ.
- various: encourage/require CP/PMs to do some word-list management beforehand
- forumpost:278763 garweyne: write an email to PMs with a summary of what are the new functionalities and what they are expected to do and a pointer to the documentation. (and forumpost:289765, forumpost:292237 kraester)
- forumpost:290046 cpeel: install FAQ at same time as message-to-PMs (forumpost:290105 kraester: or beforehand)
- forumpost:289521 dkretz: do load testing
phased?
- forumpost:276814 dkretz: maybe start with one language and unpopulated global lists.
- forumpost:277879 dkretz: start out with the simplest configuration we can: one language, no scannos. Add a second language in a week if everything's smooth.
- forumpost:281103 big_bill: roll out the code in two phases (PM interface first) (See also forumpost:281335 and forumpost:289393 DESiegel60)
- forumpost:277582 jmdyck: two concurrent interfaces to production site, one with new code. (Could gradually increase use by making it more widely known.) (forumpost:289843 kraester)
- forumpost:289384 garweyne: create a mirror of the live site, ask PMs to generate reasonable wordlists
- forumpost:289559 garweyne: alternative to mirror (and forumpost:289765, forumpost:290209)
site-wide lists
- forumpost:276943 jmdyck: maybe rollout with site-wide lists empty. (But see forumpost:277746.) (See also forumpost:281336.)
- forumpost:276997 big_bill: start with a reasonable length site-level list of bad words
- forumpost:277055 garweyne: we should start with a full list of stealth scannos, and possibly reduce it later rather than the opposite
- forumpost:278189 JulietS: keep them short (forumpost:281503 fairly minimal)
polls
- forumpost:276659 dkretz: some kind of poll before release
- forumpost:276749 Neologist: maybe take a poll some time after release, long enough for people to develop an informed opinion, but not so long that the new system gets ingrained as a "final" version. Then make any necessary tuning, based on the feedback and whatever live site stats there may be.
measuring the effect
- forumpost:276661 dkretz: measure effect on % of proofers who spellcheck
- forumpost:277723 dkretz: monitor change in throughput
- forumpost:277746 kraester: maybe run projects under both old and new code, compare results.
cpeel's list
- Core feature discussion - Finished
- Core feature implementation - Finished
- Test proofer interface - Mostly finished, still in progress
- Test PM interface - Mostly finished, still in progress
- Solidify changes to PM interface and push to main test server for testing - Our next target
- Release PM interface & FAQ to production site
- Create new forum thread for new PM interface questions/discussions
- Announce new PM interface to PMs -- possibly via forum post mentioned above
- Tweak released PM interface and FAQ based on PM feedback
- Solidify changes to proofer interface and push to main test server for testing
- Release proofer interface to main site (staged? user option?) (forumpost:289357 t-bonham@scc.net: include releasing a corrected version of the documentation)
- Create new forum thread for new proofer interface questions/discussions
- Tweak released proofer interface and FAQ based on feedback
- After the chaos dies down, take a coding vacation