User:Camomiletea/Regexes
Jump to navigation
Jump to search
Regexes I found useful
Search | Replace | Function |
---|---|---|
^-{5}File: ([^.]+)\.(pn|jp)g.* | -----File: $1.$2g----- | Strip proofer names |
([,.;:'"!?])-\b | $1-- | Find dashes after punctuation that should be em-dashes |
\b-([.,;:'"!?]) | --$1 | Find dashes before punctuation that should be em-dashes |
Mrs?[^\.] | (none) | Mr vs. Mr. |
\n\n\n | (none) | Check chapter/section spacing |
--- | (none) | Check dashes |
(\d+)--(\d+) | $1-$2 | Correct em-dashes between numbers to dashes |
[\x{0100}-\x{ffffff}] | Check for Unicode characters | |
[\x7F-\x{0100}] | Check for non-ASCII characters | |
</?i> | _ | Converting italics in text version |
<(/?)i> | <$1em> | Converting italics in HTML, if desired |
<p>(\P{IsLower}+)</p> | <h3>$1</h3> | Convert all upper-case one-line paragraphs to headings in HTML |
<sc>((.|\n)+?)</sc> | \U$1\E | Converting small-caps in text version |
([A-Z]\.) ([A-Z]\.) | $1 $2 | convert spaces in initials to no-breaking spaces in HTML |
"pagenum"><a name="Page_(\d+)" id="Page_\1">\[Pg \1\]</a> | "pagenum" title="Page $1"> <a name="Page_$1" id="Page_$1"></a> | Convert auto-generated pagenum output in HTML why? |
Creating Guiguts-style dictionary from word lists
Ensure that the word list is as you want it: it is case-sensitive!
- First, you must escape all apostrophes; i.e. find ' and replace with \'
- Use the following regex: find ^(.*)$ and replace with '$1' => '',
- At the beginning add on a separate line: %projectdict = (
- At the end, remove the final comma, and on a separate line add: );
Missing periods
- t[\s\n]\p{IsUpper}
- It seems that early 20th century typesetters often squashed the "t." closely enough together for the OCR to think it's just a "t".
- Missing periods after Roman Numerals
- Missing period in "&c.," when followed by a comma
- ".," often becomes just a comma, or a semicolon
- per cent. - in books where a period is used in this phrase (don't make it a comma)
- [XIVLC]L
- find a Roman numeral ending in L, which should be I