Site conversion to Unicode/Esperanto

From DPWiki
Jump to: navigation, search

This year (2020), Distributed Proofreaders is undertaking a Site conversion to Unicode, which will switch to using UTF-8. For Esperanto public domain texts, that means that we will be able to process them using the native accented letters (ĉapelitaj literoj), and no longer need to use the X-system surrogate.

There are a number of steps to go through before that will be fully functional. This page is intended to be a place to track those.

  1. Convert the main site to use the UTF-8 code. (At this point the site will be fully Unicode-capable, but at first will stay restricted to a set of characters similar to Latin-1.) (Done! May 19th)
  2. Enable the "Extended European Latin" character suite, which will allow many more characters to be used in the system. This includes upper and lower case versions of the six letters needed for Esperanto.
  3. Recode all Esperanto projects that are in process, to use accented letters. This will require having them unavailable, preferably in between rounds. (An alternative would be to wait until all Esperanto projects have moved into PP, without starting any new ones, but that would be a long wait.)
  4. Install a new Esperanto dictionary for WordCheck, that uses accented letters, not X-system. Test to be sure it is working as expected. Task 1887
  5. Update project comments, documentation in wiki, etc.
  6. Make Esperanto projects again available for proofing.

Input of accented characters

Once those steps are done, DP users may need guidance for how to input the characters. This is a draft version of that information, for use once the steps above are all done.

For Esperanto projects at DP, we want to use the correct accented letters. If you are familiar with the x-method we used in the past, please do not use it any longer. Esperanto projects should have the correct character suites enabled to allow use of the letters: ĉ, ĝ, ĥ, ĵ, ŝ, and ŭ. However, those six letters do not occur on physical keyboards. Here are a few ways to input them:

  1. The character picker in the DP interface will allow users to select the characters from alphabetized lists. When a character has been selected once, it will then appear on the "most recently used" list, to make it easier to access subsequent times.
  2. If you use the method described in the "Characters with Diacritical Marks" section

of the Proofreading Guidelines the proofing interface will automatically replace your markup with the accented character. So for example, if you type [^g] the system will transform that into ĝ. Note this will only work for characters that are in the currently supported character suites for the project, and only if you type it in. (Cut and paste will not work.)

  1. Some operating systems may already have input methods included. For example, in MAC OS X it's built into the system, and just need to be enabled; and the Google Keyboard in Android systems supports Esperanto characters as well.

If someone wants to use Esperanto frequently, they can install a secondary program such as Tajpi which changes certain key combinations into the correct characters.

The text editor Unired has a number of features to work specifically with Esperanto, including character input via x-method.