Esperanto for Proofers
Character input
For Esperanto projects at DP, we want to use the correct accented letters. Esperanto projects should have the correct character suites enabled to allow use of the letters: ĉ, ĝ, ĥ, ĵ, ŝ, and ŭ. However, those six letters do not occur on physical keyboards. Here are a few ways to input them:
- The pickersets in the proofreading Interface allow users to select the characters from organized lists. When a character has been selected once, it will then appear on the "most recently used" list, to make it easier to access subsequent times.
- If you use the entry method described in the Characters with Diacritical Marks section of the Proofreading Guidelines the proofing interface will automatically replace your markup with the accented character. So for example, if you type [^g] the system will transform that into ĝ. Note that replacement will only work for characters that are in the currently supported character suites for the project, and only takes effect when you type the final ].
- Some operating systems may already have input methods included. For example, in MAC OS X it's built into the system, and just needs to be enabled; and the Google Keyboard in Android systems supports Esperanto characters as well.
Less often used input options, using other programs:
- If someone wants to use Esperanto frequently, they can install a secondary program such as Tajpi which changes certain key combinations into the correct characters.
- The text editor Unired has a number of features to work specifically with Esperanto, including character input via x-method.
Typesetting conventions
"Esperanto" in general, does not have rules about punctuation usage and typesetting. In contemporary usage one will often see what I would call "English normal" punctuation styles, but it is not too unusual, particularly in the books we work on here, to see a typesetter use their customary national conventions.
And since we've worked on Esperanto material published in France, Germany, the Netherlands, Russia, England, Hungary, Finland, and the USA, we have encountered some variety in punctuation styles. Given the international nature of the Esperanto movement, it can also happen that a book was written by an author in one country, the typesetting and composition were prepared in a second country, and the printing/publication done in a third country.
Here are some of the specific details we have run across at DP, on their own, or sometimes combined in unexpected ways:
Quote mark styles
You may encounter:
- “English-style”
- „German-style“
- «French-style»
- — quotation dash
or others...
The elided article
The word "la" in Esperanto may be elided, and appear as:
l'
Most often, this is found in verse, but occasionally in prose as well.
Modern consensus among Esperanto users appears to be that "La" is a separate word, and should have a space after it, even when elided. But in the typesetting of the Esperanto books we run into here, we see great variation of spacing, sometimes even within the same text. So it can appear to be "connected" to the next word, for example:
l'afero
Ellipses
Number and spacing of dots can vary.
Emdash
Some Esperanto books appear to have clear spaces around emdashes. Sometimes, the surrounding punctuation, or double up of dashes makes it clear that such spaces are required.
Gesperrt
Particularly in books prepared in German-speaking countries, we can encounter gesperrt (spaced-out) type instead of italics to indicate emphasis.
Hyphenated words
Compound words in Esperanto are frequent, with the roots being put together as needed. Hyphens between roots in those words are neither explicitly required, not forbidden. The general modern practice is to use them only in occasional cases, to help clarity, and avoid ambiguity. (Also compounds with the participle ĉi usually are hyphenated.)
This may vary a little in different books, and it's quite common for hyphenation usage to vary within the same book. I've seen an example where some words seemed to be hyphenated on first appearance, to clarify meaning, and then unhyphenated on subsequent uses.
There may be different authorial styles. For example, in the novel Viktimoj, Julio Baghy appears to have preferred to use many longer compound words, with scarcely ever a hyphen; but in some early books aimed at beginners, it was seen as desirable to frequently break up words to show constituate elements.
Language usage
It is not unusual in the Esperanto texts from we era we work on to have unusual grammatical constructions. The early 20th century was an era when there was still a good amount of flux in the language, and good language models and reference material was hard to come by, or non-existant.
So we may see:
- complex verb constructions, directly translated from a source language, that are awkward or even nonsensical in Esperanto.
- word-for-word translations of national idioms that make no sense in an Esperanto context.
- weird "kunmetaĵoj", where the author has tried to make a compound word, that really doesn't work.
- experimental new or alternate words, that never became accepted as standard parts of the language.
Alphabet
The alphabet is a, b, c, ĉ, d, e, f, g, ĝ, h, ĥ, i, j, ĵ, k, l, m, n, o, p, r, s, ŝ, t, u, ŭ, v, z. Note that q, w, x and y aren't in that list. Any other accented letters should only occur in words or names quoted from other languages.
Ĉ makes the sound of English ch, ŝ is English sh, ĵ is z as in azure, ĝ is dg in judge, and ĥ is like in Bach or Scottish loch.
Vocabulary
Virtually all of the original vocabulary came from (or through) Romance and Germanic languages, with a few exceptions. When words with q were borrowed, the q often became kv; w became ŭ or v, and x usually ks. The definite article "the" is invariably "la".
Ends of words
In order of likeliness, a o n s e j i are common, l u r m ŭ d are uncommon, t k ĉ are rare, b p g c h v are very rare (b being less than 1% as common as s), and the rest are less likely to end a word in the corpus than y (not an Esperanto letter). In particular, j is common even if the OCR likes to read it as ; and ĵ was less likely to end a word than φ. (There was some math in the corpus.)
Verbs end in i (for infinitives), u (for commands) or as, is, os, or us as appropriate for the tense. Nouns end in o, with a j to denote plurals and n to denote accusative case (jn denoting plural accusatives), and adjectives end in a, copying the j and/or n from the noun. "Mi amas la belajn katojn."
Note that poetry frequently clips words (usually a final vowel) and ends them with '. Even some prose will sometimes elide the a of la.
Starts of words
The letter ŭ, in theory, shouldn't start words. In practice, it does start a few, ŭato (watt) for example. Besides ŭ, the start of words is much more evenly distributed than the ends and won't help the proofer as much in catching errors. In the initial position, c is about 1/6 as common as ĉ, ŝ is 1/6 as common as s, and ĵ 1/10 as common as j. z, ĵ and ĥ are particularly uncommon.
Common scannos
- Ci is very rare; it's usually ĉi.
- An unaccented u rarely follows another vowel. It should usually be ŭ.
Common words
- la (="the"; It is three times more common than any other word);
- kaj, de, en, al, mi, ne, estas, li, ke, vi, ĉi are very common;
- por, sed, kun, ni, sur, per, ŝi, estis, el, ili, kiel, kiu, tiu, ĝi, pli, unu, oni, jam, da, tio, tiel, nur, and sin are common words.