User:Camomiletea/Russian
Russian in Old Orthography presents special challenges. Some of the things I've learned from PPVing a project are listed below.
Word Frequency Routine
Select Sort Alpha, and save the wordlist. Open in Guiguts and highlight every ѣ (arbitrary character); look through to make sure there are no similar words with "е" that should have ѣ.
Regexes
Check for adjacent Latin/Cyrillic characters: \p{Cyrillic}\p{Latin}|\p{Latin}\p{Cyrillic}
Check for soft signs that should be hard signs: ь([.,!?;: -'"\(\)\[\]»«\n]) => ъ$1
Other common scannos: soft sign (ь) could be read by OCR as ы.
Gutcheck
Since Gutcheck freaks out about non-ASCII, I need the following regexes to replace some of the checks
Long lines .{,75}
Short lines ^.{2,54}\n\S
Line starts with punctuation (modify quotes depending on language): ^ *[^-\p{Alpha}\d+&(\[{"“'‘’_ ]
No punctuation at paragraph end: [\p{Alpha},;]["”]*\n\n
Spellchecking
When most checks are complete, a copy of the file should be saved, and the orthography updated so it can be spell-checked with modern dictionary (this works better than I thought, but still lots of false positives because of ending such as -аго, -ыя, and beginnings like раз-)
Remove hard signs from the end of words (case-insensitive on!): ъ([.,!?;: -'"\(\)\[\]»«]) => $1
Replace і with и, ѣ with е, І with И, Ѣ with Е
- Not tested, but should work well: Replace ыя at the end of words with ые.
Aspell doesn't play well on my system with Russian. MS Office spellcheck works pretty well though (I had grammar check disabled).
My old laptop was having problems, so I needed to get a new one and get the Russian spellcheck for it. But MS stopped providing the proofing tools - I know I asked. Took me the longest time to figure out, so for future note if I need it again. The necessary files can be found in: Program Files -> Common Files -> microsoft shared -> PROOF. Copy every file with RU in the file name; be sure to copy the hidden .GID file.