Danish site word lists

From DPWiki
Jump to: navigation, search

This page collects words to add to the site-wide word lists used by WordCheck for Danish. The current word lists in use can be found at the FAQ Central.

Please do not remove words, but rather discuss them on the discussion page!


Bad Words List

The current BWL can basically be divided into four classes of words:

  1. words that shouldn't have been in the GWL in the first place (errors by aspell, such as flakkeude and hedømme),
  2. lowercased nouns that are scanno candidates,
  3. nouns where missing capitalisation is often overlooked,
  4. potential stealth scannos (e.g. bar/har, tik/fik, jog/jeg)

In group #3 there are typically words where the initial letter has the same shape in upper- and lowercase, such as ø/Ø, o/O, s/S, v/V, and j/J. In #2 there are nouns which are valid lowercased in modern spelling, but in DP age material they should either be uppercased or are scannos (e.g. dom should be either Dom or dem). #4 contains word that are not extremely common, so har, fik and jeg are not included at the moment, as they would give very many false flags. Unfortunately the lists linked to above don't maintain their original sorting, so it isn't easy to see which words belong to which group - this makes it harder to remove lowercased nouns from the list for books with a more "modern" style, where they're ok.

Words to add:

bæst
del
fil
hank
hanke
hankede
ilden
knude
sølv

Notes

øjne is also valid lowercased as a verb, but occurs quite often as a scanno for Øjne, so it's included in the list.


Fraktur Bad Words List

fang
frist
fætter
kager
lykke
sange
stikket
sække
taste
tastede
taster
tastet
tun


Good Words List

The current GWL has been compiled from project word lists from around 25 projects that had completed P3. Many words (~9000 of 11000) occured only in 1 list, so upcoming projects are still likely to contain many words that are not in the general GWL. A considerable effort has been made to clean proper names from individual projects out of the list, so it should only contain words that are actually valid and "normal".

Spellings with i (Øine, Leilighed) have been left out of the GWL, as they'll also be likely scannos for the much more common j counterparts (Øjne, Lejlighed).

Words to consider including in the future:

20erne
30erne
40erne
50erne
60erne
70erne
80erne
90erne