From DPWiki
Jump to navigation Jump to search

The Latin-1 Palette

Click Guiguts Tb-latin1.png in the toolbar or use Help> Latin-1 Chart to open a small palette displaying the special characters of the Latin-1 set. Click on a character in the palette to insert either the character or its HTML entity at the insertion point.

The Unicode Menu, Lookup, and Search

You can access Unicode characters in three ways.

The Unicode Menu

Single-click Unicode in the menu bar to open a long menu listing blocks of characters. The list is ordered alphabetically by the titles of the blocks. (This menu may extend off the bottom of the screen. In Windows you can scroll it to see all entries; the last one is "Unified Canadian Aboriginal Syllabics." In Linux and OS X it does not scroll and you cannot reach all entries. A simple way to work around this bug is to open the guiguts.pl file in a text editor and comment out all the languages that you cannot imagine working with. This will make the list considerable shorter.)

Click on one of the block names to open a palette displaying the characters in that block. If the font you are using does not contain certain characters, those characters display as blanks or empty boxes. Hover the mouse over a character; a "tool tip" pops up listing the official caption and decimal and hex values of that character. Click the character to insert it in the document. At the top of a palette you can select whether the palette will insert the Unicode character itself, or the HTML entity for it.

Support for the feature of pop-up tool tips in the Unicode palettes can make the palettes slow to load the first time each is used. You can disable the feature by changing the name of the file Unicode in the Guiguts folder to any other name.

When scanning the menu for a character by its number, notice that the menu lists blocks by their hexadecimal, not decimal, value. The code for a left double quotation mark, &#8220, is found at 201C, not at 8220.

Unicode Lookup by Ordinal

Guiguts UC lookup.png

You can also search for a Unicode character by its ordinal number, if you know it. Select Help>UTF Character Entry to open the search dialog. Select either the Hex or Decimal switch and enter the ordinal value of the character. The character itself is displayed. When you click OK, the character is inserted in the document at the insertion point.

Unicode Search by Name

You can search for Unicode characters using keywords from their captions. Select Help>UTF Character Search to open this dialog:

Guiguts UC search.png

Enter one or more keywords in the search field. All Unicode characters having those keywords in their standard captions are listed. You can then:

  • Left-click on the displayed character to insert it in the document.
  • Right-click (Mac: ctl-click) on the character to put it in the copy buffer so you can paste it anywhere.
  • Left-click on the caption text to open a palette showing the numeric block containing that character.

Unicode Reference Sources

If you do not know the name or ordinal of a Unicode character you can search using Alan Wood's page or using the official site.

Guiguts does not provide the entire gamut of Unicode symbols. For a table of all Unicode available through the Guiguts menu, open Thundergnat's page in a separate window (it may take some time to load). Characters that are not in the serif font used by your browser display as empty boxes.

Unicode and File Format

When the document contains even one Unicode character outside of Latin-1 (that is, any multi-byte character), Guiguts will save the file as a UTF-8 file. Any other text editor that is compatible with UTF-8 should open the file correctly. If you remove all multi-byte characters from a document, it is next saved as Latin-1.

Guiguts at this time requires the Aspell spelling checker version 0.5, which does not handle Unicode. Spell-checking may not work right at least for words containing multi-byte characters.

The Gutcheck tool does not handle UTF-8 data well. If the document contains more than a very few multi-byte Unicode characters, running Gutcheck may produce useless output.

The Greek Transliteration Tool

Knowledge of Greek was basic to advanced education throughout the 18th and 19th centuries; naturally, scholars and poets of those periods toss words, phrases, even paragraphs of Greek into their books. PG wants these transliterated into ASCII equivalents; the method is summarized in the PG FAQ page.

The DP Guidelines tell the proofer to transliterate Greek text and enclose it in [Greek:] markup. The standard proofing interface has a pop-up tool to assist this. However, you need to recheck and possibly re-do all Greek, for two reasons. First, transliteration is difficult, and proofer errors are likely. Second, the pop-up tool does not support all accents and obsolete characters, so if you understand Greek orthography, you may be able to do a better or more complete job.

Greek in ASCII, Beta, Unicode and HTML

The PG method of transliteration used by proofers is a simple conversion from Greek symbols to 8-bit Latin-1. Beta coding is a more complex transliteration scheme that lets you preserve more of the Greek orthography in ASCII form. The Beta code is summarized on this page. Note that what Guiguts calls "Beta" is a hybrid using normal Beta code accents, but the letters from the normal PG transliteration method, so psi remains as "ps", rather than the "y" listed on the Beta code page.

All the Greek symbols are available in two blocks of Unicode. They can be found in the middle of the Guiguts Unicode menu. These characters require multi-byte codes, so if you put them in an etext it will be saved in UTF-8 form.

All the Greek alphabet symbols have HTML entity codes. Thus the HTML version of an etext can display the original Greek text while remaining an ASCII document.

The Greek Tool

Use Help> Greek Transliteration or click Guiguts Tb-greek.png in the toolbar to open the Greek Transliteration tool:

Guiguts Greektool.png

Alternatively, Fixup > Find Greek will find the first [Greek: tag in your document (after the current insert point) and cut and paste it into the tool for you.

To enter transliterated Greek text, you click on the images of the characters in sequence. The transliteration is built up in the text window based on your selection of the four switches at the top of the window:

  • The Latin-1 switch produces PG/Beta ASCII codes.
  • The Greek Name switch produces the English names of the characters.
  • The HTML code switch produces HTML Entity codes.
  • The UTF-8 switch produces Unicode characters.

Click Space to enter a space. You can also edit the text in the text window manually, and cut, copy and paste into it.

When the text in the window is correct, click Transfer to insert the contents of the text window at the insertion point in the document. Transfer and Get Next will do this, move the insertion point to the next bit of Greek and cut-and-paste it into the transliteration tool.

To build a character with accents and/or breathing marks, type the base ASCII letter in the small Character Builder field at the bottom of the window. The corresponding Greek character is shown. Click on the Beta-code accent marks to the right, or key the corresponding character (paren, slash or tilde) and Guiguts displays the resulting composite character. To produce a complex character such as ἕ add the breathing mark (paren code) first, then add the accent (slash code). Note that only certain sequences are accepted: if a diacritic selection is ignored, it may have been entered in the wrong order or it may be invalid.

Key Enter to move the composite character into the text window. The cursor stays in the Character Builder and you can enter another character.

While the cursor is in the Character Builder field you can key:

Enter alone Puts a linebreak in the text
Backspace Deletes last letter in the text
Space Puts a space in the text
s then space Builds terminating lowercase sigma
o^ or O^ (or w/W) Builds lowercase or uppercase Omega
e^ or E^ (or h/H) Builds lowercase or uppercase Eta
ph or Ph Builds lowercase or uppercase Phi
th or Th Builds lowercase or uppercase Theta

Four buttons in the second row automatically convert the contents of the main text field from one encoding to another. For example, you can copy a proofer's transliteration and paste it into the text window. Then click ASCII->Greek to convert to Greek symbols. Now you can compare the Greek to the original page image and see if the proofer got it right.

You can enter HTML entity codes directly by setting that switch and clicking on Greek letters. There is no direct method for converting built-up characters to HTML. You can do it indirectly as follows: When the desired text is visible as ASCII codes, use either ASCII->Greek or Beta Code->Unicode to get Greek symbols. Click Transmit to put the symbols in the document. Highlight the symbols in the document and use Selection> Convert to Named/Numeric Entities.

Recommended Greek workflow

(Needs new features in .65)

Transliteration phase:

  • Go to the top of your document
  • Select Tools>Character Tools>Greek Transliteration to open the transliteration window
  • Bring the first bit of Greek into it.
  • Bring up the relevant image, check and correct the Greek, and add accents if desired.
  • Click the ASCII->Greek' button and it should change the contents of the window to Greek letters
  • Click the Transfer button and it should copy the Greek back to the document

Note that your Greek will remain inside [Greek:] during most of the checks, but this is harmless.

Once you have split the project into HTML and text versions, run the following search and replace regexps:

Text version:

  • Search: \[Greek: +((.|\n)+?)\]
  • Replace: [Greek: \GB$1\E]

to convert the Greek into beta-code then

  • same search term
  • Replace {\GA$1\E}

to remove all the accents. (Replace {} with whatever you want Greek to be labelled with in your text version.)

HTML version:

Run the following at some point before HTML fixup to throw away the [Greek:] tags

  • Search: \[Greek: +((.|\n)+?)\]
  • Replace: $1