Cló Gaelach
Cló Gaelach or Gaelic type is a type of writing commonly used to write and print the Irish language, especially before the mid 20th century. It is not a separate script but a typeface, and normally text in Cló Gaelach is transcribed with the same code points used for normal Roman type.
For Content Providers
Currently there are few OCR options, and no free options, that will reliably read Cló Gaelach, although some training materials are available.
There are several models available from Transkribus that can be tested for free with a few pages, but require a paid subscription for bulk processing. The most reliable seems to be this one. However, CPs should be aware that:
- This model replaces lenition dots with the letter h, so if preserving the dots is important you will need to use a different model (for example, this one) or replace them during preprocessing.
- This model also replaces the Tironian et (⁊) with the word "agus."
- The model is particularly bad at recognising quotation marks and other less common punctuation; you may want to warn your F1s if your text contains a lot of punctuation.
For Project Managers
Cló Gaelach includes dots above certain characters. You will need to either provide these as part of a custom character set, or instruct proofreaders to use the letter h following the consonant instead.
Also, you will need to decide whether you would like proofreaders to represent agus (⁊) with the Unicode code point for the Tironian et, or with a normal ampersand. If you want to use a code point other than ampersand, you will need to include it in a custom character palette.
For a text containing mixed Roman and Cló Gaelach texts, you may wish to instruct formatters to use the <f> tag to indicate Cló Gaelach.
Some texts contain a mix of short and long forms (or Roman and insular forms) for some letters, such as s. You should review the text and give clear instructions as to how to treat these characters. If you would like to retain the distinction between forms, you will need to include the insular forms in a custom palette.
For Proofreaders
Most of the characters you will see in a Cló Gaelach text are similar enough to their Latin counterparts that you will have no trouble reading them. But there are a few tricky aspects:
Lowercase s and r
The toughest letters for those unfamiliar with Cló Gaelach are lowercase "s" and "r", which look very similar to each other; the only difference is that the "r" has a longer downstroke on the right-hand side. See the illustration here, showing the word "Cambrens". For another example in a slightly different typeface, look at the first and last words of the first line of the example at the top of the page: "Féasta" and "Mór".
Insular g
Another letter that looks very different in Cló Gaelach is G.
Captial and Lowercase Letters
Irish uses much the same rules for capitalisation as English does; however, in Cló Gaelach, the capital letters are usually just slightly larger versions of the lowercase letters. It can be hard to distinguish some of these, especially the captial "L". Pay close attention to the height of the first letter of a word, especially in proper names or at the beginning of a sentence.
Lenition
Lenition, in Irish called séimhiú, is a change in the sound of an initial consonant. In modern Irish, this is indicated by adding an h after the consonant:
An bus (the bus) Dhá bhus (two buses)
In most Cló Gaelach texts, a séimhiú is instead indicated with a dot above the consonant:
An bus (the bus) Dhá ḃus (two buses)
The project comments should specify how to handle this in the transcription--whether to transcribe the dots or to replace them with the letter h.
Ampersands and agus
The Irish word for and is "agus." Older Irish texts, including most texts in Cló Gaelach, sometimes abbreviate this using the Tironian et character (⁊) rather than the ampersand used in English (&). The project comments should say how to handle this character if it appears in the project.
Abbreviations
Older Irish texts sometimes include unusual abbrevations. For example:
.i.
is an abbreviation for the Latin phrase "id est", usually abbreviated in modern English and Irish as "i.e." If you see any unusual abbreviations like this, please ask about them in the project discussion thread.