Danish project level word list suggestions
Here are some possible words that could be part of Good and Bad Word Lists at a project level for projects in this language.
Bad Words List
The current BWL can basically be divided into four classes of words:
- words that shouldn't have been in the GWL in the first place (errors by aspell, such as flakkeude and hedømme),
- lowercased nouns that are scanno candidates,
- nouns where missing capitalisation is often overlooked,
- potential stealth scannos (e.g. bar/har, tik/fik, jog/jeg)
In group #3 there are typically words where the initial letter has the same shape in upper- and lowercase, such as ø/Ø, o/O, s/S, v/V, and j/J. In #2 there are nouns which are valid lowercased in modern spelling, but in DP age material they should either be uppercased or are scannos (e.g. dom should be either Dom or dem). #4 contains word that are not extremely common, so har, fik and jeg are not included at the moment, as they would give very many false flags. Unfortunately the lists linked to above don't maintain their original sorting, so it isn't easy to see which words belong to which group - this makes it harder to remove lowercased nouns from the list for books with a more "modern" style, where they're ok.
Words to add:
bæst del fil hank hanke hankede ilden knude sølv
Notes
øjne is also valid lowercased as a verb, but occurs quite often as a scanno for Øjne, so it's included in the list.
Fraktur Bad Words List
fang frist fætter kager lykke sange stikket sække taste tastede taster tastet tun
Good Words List
The current GWL has been compiled from project word lists from around 25 projects that had completed P3. Many words (~9000 of 11000) occured only in 1 list, so upcoming projects are still likely to contain many words that are not in the general GWL. A considerable effort has been made to clean proper names from individual projects out of the list, so it should only contain words that are actually valid and "normal".
Spellings with i (Øine, Leilighed) have been left out of the GWL, as they'll also be likely scannos for the much more common j counterparts (Øjne, Lejlighed).
Words to consider including in the future:
20erne 30erne 40erne 50erne 60erne 70erne 80erne 90erne