Danish project level word list suggestions

From DPWiki
Jump to navigation Jump to search

Here are some possible words that could be part of Good and Bad Word Lists at a project level for projects in this language.

Bad Words List

The current BWL can basically be divided into four classes of words:

  1. words that shouldn't have been in the GWL in the first place (errors by aspell, such as flakkeude and hedømme),
  2. lowercased nouns that are scanno candidates,
  3. nouns where missing capitalisation is often overlooked,
  4. potential stealth scannos (e.g. bar/har, tik/fik, jog/jeg)

In group #3 there are typically words where the initial letter has the same shape in upper- and lowercase, such as ø/Ø, o/O, s/S, v/V, and j/J. In #2 there are nouns which are valid lowercased in modern spelling, but in DP age material they should either be uppercased or are scannos (e.g. dom should be either Dom or dem). #4 contains word that are not extremely common, so har, fik and jeg are not included at the moment, as they would give very many false flags. Unfortunately the lists linked to above don't maintain their original sorting, so it isn't easy to see which words belong to which group - this makes it harder to remove lowercased nouns from the list for books with a more "modern" style, where they're ok.

Words to add:

bæst
del
fil
hank
hanke
hankede
ilden
knude
sølv

Notes

øjne is also valid lowercased as a verb, but occurs quite often as a scanno for Øjne, so it's included in the list.


Fraktur Bad Words List

fang
frist
fætter
kager
lykke
sange
stikket
sække
taste
tastede
taster
tastet
tun


Good Words List

The current GWL has been compiled from project word lists from around 25 projects that had completed P3. Many words (~9000 of 11000) occured only in 1 list, so upcoming projects are still likely to contain many words that are not in the general GWL. A considerable effort has been made to clean proper names from individual projects out of the list, so it should only contain words that are actually valid and "normal".

Spellings with i (Øine, Leilighed) have been left out of the GWL, as they'll also be likely scannos for the much more common j counterparts (Øjne, Lejlighed).

Words to consider including in the future:

20erne
30erne
40erne
50erne
60erne
70erne
80erne
90erne