Esperanto for Proofers

From DPWiki
Jump to navigation Jump to search

Character input

For Esperanto projects at DP, we want to use the correct accented letters. If you are familiar with the x-method we used in the past, please do not use it any longer (unless the project comments ask you to). Esperanto projects should have the correct character suites enabled to allow use of the letters: ĉ, ĝ, ĥ, ĵ, ŝ, and ŭ. However, those six letters do not occur on physical keyboards. Here are a few ways to input them:

  1. The pickersets in the proofreading Interface allow users to select the characters from organized lists. When a character has been selected once, it will then appear on the "most recently used" list, to make it easier to access subsequent times.
  2. If you use the entry method described in the Characters with Diacritical Marks section of the Proofreading Guidelines the proofing interface will automatically replace your markup with the accented character. So for example, if you type [^g] the system will transform that into ĝ. Note that replacement will only work for characters that are in the currently supported character suites for the project, and only takes effect when you type the final ].
  3. Some operating systems may already have input methods included. For example, in MAC OS X it's built into the system, and just needs to be enabled; and the Google Keyboard in Android systems supports Esperanto characters as well.

Less often used input options, using other programs:

  1. If someone wants to use Esperanto frequently, they can install a secondary program such as Tajpi which changes certain key combinations into the correct characters.
  2. The text editor Unired has a number of features to work specifically with Esperanto, including character input via x-method.

Type setting conventions

"Esperanto" in general, does not have rules about punctuation usage and typesetting. In contemporary usage one will often see what I would call "English normal" punctuation styles, but it is not too unusual, particularly in the books we work on here, to see a typesetter use their customary national conventions.

And since we've worked on Esperanto material published in France, Germany, the Netherlands, Russia, England, Hungary, Finland, and the USA, we have seen some variety in punctuation styles. Given the international nature of the Esperanto movement, it can also happen that a book was written by an author in one country, the type-setting and composition were prepared in a second country, and the printing/publication done in a third country.

Here are some of the specific details we have run across at DP:

Quote mark styles

You may encounter:

  • “English-style”
  • „German-style“
  • «French-style»
  • — quotation dash

or others...

The elided article

The word "la" in Esperanto may be elided, and appear as:

l'

Most often, this is found in verse, but occasionally in prose as well.

Modern consensus among Esperanto users appears to be that "La" is a separate word, and should have a space after it, even when elided. But in the typesetting of the Esperanto books we run into here, we see great variation of spacing, sometimes even within the same text. So it can appear to be "connected" to the next word, for example:

l'afero

Ellipses

Number and spacing of dots can vary.

Emdash

Some Esperanto books appear to have clear spaces around emdashes. Sometimes, the surrounding punctuation, or double up of dashes makes it clear that such spaces are required.

Gesperrt

Particularly in books prepared in German-speaking countries, we can encounter gesperrt (spaced-out) type instead of italics to indicate emphasis.

Alphabet

The alphabet is a, b, c, ĉ, d, e, f, g, ĝ, h, ĥ, i, j, ĵ, k, l, m, n, o, p, r, s, ŝ, t, u, ŭ, v, z. Note that q, w, x and y aren't in that list.

Ĉ makes the sound of English ch, ŝ is English sh, ĵ is z as in azure, ĝ is dg in judge, and ĥ is like in Bach or Scottish loch.

Vocabulary

Virtually all of the original vocabulary came from (or through) Romance and Germanic languages, with a few exceptions. When words with q were borrowed, the q often became kv; w became ŭ or v, and x usually ks. The definite article "the" is invariably "la".

Ends of words

In order of likeliness, a o n s e j i are common, l u r m ŭ d are uncommon, t k ĉ are rare, b p g c h v are very rare (b being less than 1% as common as s), and the rest are less likely to end a word in the corpus than y (not an Esperanto letter). In particular, j is common even if the OCR likes to read it as ; and ĵ was less likely to end a word than φ. (There was some math in the corpus.)

Verbs end in i (for infinitives), u (for commands) or as, is, os, or us as appropriate for the tense. Nouns end in o, with a j to denote plurals and n to denote accusative case (jn denoting plural accusatives), and adjectives end in a, copying the j and/or n from the noun. "Mi amas la belajn katojn."

Note that poetry frequently clips words (usually a final vowel) and ends them with '. Even some prose will sometimes elide the a of la.

Starts of words

The letter ŭ, in theory, shouldn't start words. In practice, it does start a few, ŭato (watt) for example. Besides ŭ, the start of words is much more evenly distributed than the ends and won't help the proofer as much in catching errors. In the initial position, c is about 1/6 as common as ĉ, ŝ is 1/6 as common as s, and ĵ 1/10 as common as j. z, ĵ and ĥ are particularly uncommon.

Hyphenation

Compound words in Esperanto are frequent, with the roots being put together as needed, -- using a hyphen between those roots is neither required nor forbidden. General practice is not to use a hyphen in such cases (similar to compound words in German or Finnish).

Hyphens are usually only used to help clarify a construction that might be ambiguous or avoid an awkward consonant cluster. However, different editors have varying ideas of what is or is not awkward, so usage is inconsistent. Even within the same book there may be differences.

Common scannos

  • Ci is very rare; it's usually ĉi.
  • An unaccented u rarely follows another vowel. It should usually be ŭ.

Common words

  • la (="the"; It is three times more common than any other word);
  • kaj, de, en, al, mi, ne, estas, li, ke, vi, ĉi are very common;
  • por, sed, kun, ni, sur, per, ŝi, estis, el, ili, kiel, kiu, tiu, ĝi, pli, unu, oni, jam, da, tio, tiel, nur, and sin are common words.