Esperanto for Proofers

From DPWiki
Jump to: navigation, search

Character input

For Esperanto projects at DP, we want to use the correct accented letters. If you are familiar with the x-method we used in the past, please do not use it any longer (unless the project comments ask you to, for remaining texts still in the rounds). Esperanto projects should have the correct character suites enabled to allow use of the letters: ĉ, ĝ, ĥ, ĵ, ŝ, and ŭ. However, those six letters do not occur on physical keyboards. Here are a few ways to input them:

  1. The pickersets in the proofreading Interface allow users to select the characters from organized lists. When a character has been selected once, it will then appear on the "most recently used" list, to make it easier to access subsequent times.
  2. If you use the entry method described in the Characters with Diacritical Marks section of the Proofreading Guidelines the proofing interface will automatically replace your markup with the accented character. So for example, if you type [^g] the system will transform that into ĝ. Note that replacement will only work for characters that are in the currently supported character suites for the project, and only takes effect when you type the final ].
  3. Some operating systems may already have input methods included. For example, in MAC OS X it's built into the system, and just needs to be enabled; and the Google Keyboard in Android systems supports Esperanto characters as well.

Less often used input options, using other programs:

  1. If someone wants to use Esperanto frequently, they can install a secondary program such as Tajpi which changes certain key combinations into the correct characters.
  2. The text editor Unired has a number of features to work specifically with Esperanto, including character input via x-method.

Esperanto for Proofers

The information below is to help proofers who want to better proof Esperanto without spending time learning the language.


The alphabet is a, b, c, ĉ, d, e, f, g, ĝ, h, ĥ, i, j, ĵ, k, l, m, n, o, p, r, s, ŝ, t, u, ŭ, v, z. Note that q, w, x and y aren't in that list.

Ĉ makes the sound of English ch, ŝ is English sh, ĵ is z as in azure, ĝ is dg in judge, and ĥ is like in Bach or Scottish loch.


Virtually all of the original vocabulary came from (or through) Romance and Germanic languages, with a few exceptions. When words with q were borrowed, the q often became kv; w became ŭ or v, and x usually ks. The definite article "the" is invariably "la".

Ends of words

In order of likeliness, a o n s e j i are common, l u r m ŭ d are uncommon, t k ĉ are rare, b p g c h v are very rare (b being less than 1% as common as s), and the rest are less likely to end a word in the corpus than y (not an Esperanto letter). In particular, j is common even if the OCR likes to read it as ; and ĵ was less likely to end a word than φ. (There was some math in the corpus.)

Verbs end in i (for infinitives), u (for commands) or as, is, os, or us as appropriate for the tense. Nouns end in o, with a j to denote plurals and n to denote accusative case (jn denoting plural accusatives), and adjectives end in a, copying the j and/or n from the noun. "Mi amas la belajn katojn."

Note that poetry frequently clips words (usually a final vowel) and ends them with '. Even some prose will sometimes elide the a of la.

Starts of words

The letter ŭ, in theory, shouldn't start words. In practice, it does start a few, ŭato (watt) for example. Besides ŭ, the start of words is much more evenly distributed than the ends and won't help the proofer as much in catching errors. In the initial position, c is about 1/6 as common as ĉ, ŝ is 1/6 as common as s, and ĵ 1/10 as common as j. z, ĵ and ĥ are particularly uncommon.


Punctuation is as in most other European languages; any vagueness there is deliberate. Quotes may be « », “ ”, or others. The semicolon is used, but note that the OCR frequently mistakes a final j for a semicolon.

Common scannos

  • Ci is very rare; it's usually ĉi.
  • An unaccented u rarely follows another vowel. It should usually be ŭ.

Common words

  • la (="the"; It is three times more common than any other word);
  • kaj, de, en, al, mi, ne, estas, li, ke, vi, ĉi are very common;
  • por, sed, kun, ni, sur, per, ŝi, estis, el, ili, kiel, kiu, tiu, ĝi, pli, unu, oni, jam, da, tio, tiel, nur, and sin are common words.