User:Belastro/regex
Search/Replace popup tool
You can use the Search/Replace (S/R) tool to find and replace
a string of characters in the page you are currently proofing (your current page).
The S/R replaces every occurrance
of a Search
string with the corresponding Replace
string.
The S/R interprets your Search
string as a regular expression (regex) when
you check the Regular Expression?
box; otherwise, the S/R treats
your Search
string as a literal string.
When you close your current page, the S/R tool remains open but stops working.
You can restart it by clicking the proofing interface control that opens the S/R tool.
This will clear and reset the S/R tool for your newly-opened page;
in particular, regular expressions will be turned off.
You will need to check the Regular Expression?
box again
if you are using regular expressions.
These characteristics hold for both literal and regex replacements:
- You can not use the S/R to just find your
Search
string. The S/R always replaces yourSearch
string with yourReplace
string. The S/R deletes yourSearch
string from your current page when you do not provide aReplace
string. - You can not step through your page to select specific matches to replace.
The S/R always replaces your
Search
string wherever it appears in your current page. - The S/R always repositions your view to the top of your current page when you click
either its
Replace All
button or itsUndo
button. - Your text cursor is relocated to the end of your current page when the S/R replaces any text.
- Your text cursor remains where you left it when nothing is replaced.
- The S/R enables its
Undo
button each time you click itsReplace All
button, whether or not the S/R makes any changes. In other words, the S/R enables itsUndo
button when it looks for yourSearch
string, not when it finds yourSearch
string. - The S/R does not tell you whether it has changed your current page and it does not highlight changes that it makes. You must inspect your current page from the top to determine whether any change has been made by the S/R.
- The S/R is case sensitive. The S/R will not find "CASE" when it searches for "case".
Regular expressions
The S/R provides the ECMA flavor of regular expressions. Jan Goyvaerts' comparison of regex flavors details (and compares) the ECMA flavor for search strings. Goyvaerts' replacement text reference details (and compares) the ECMA flavor for replacement strings. The S/R does not add any features or extensions to those specified by the ECMA standard.
The S/R sets these modes, which you can not change:
- Global mode is turned on. All matches within the current page are replaced.
- Case insensitivity mode is turned off. Searches are case sensitive.
- Multi-line mode is turned off. Carat (^) matches the beginning of your current page and dollar ($) matches at the end of your current page.
Some considerations:
- You must use the newline (\n) metacharacter in an expression that looks for the beginning or end of a line within your current page.
- The whitespace (\s) metacharacter matches a newline.
- The dot (.) metacharacter does not match a newline.
- The only way to get a newline into a replacement string is to capture a newline in
the search string. For example, to insert a blank line after a paragraph you might
search for "
\.(\n)(\w)
" and use ".$1$1$2
" as the replacement. - The word (\w) metacharacter matches [a-zA-Z_0-9], that is, the ASCII letters, digits, and
underscore. You must explicitly add letters with accents and diacritical
marks to the \w set if such characters are letters in the text of your current page.
For example, you might use "
[\wáéíóúü]+?
" to look for a word in a page of Spanish text. Conversely, the not-word (\W) metacharacter matches non-ASCII letters. However, you can not "subtract" such non-ASCII letters from the \W set. - The word-boundary (\b) metacharacter detects a transition between \w and \W character sets. You should not use \b in text that uses non-ASCII letters (e.g., Spanish text that uses [áéíóúü]).
- The expressions "
(X)?
" and "(X?)
", where X is a single character, are equivalent. - Capture groups are numbered in the order of their appearance in an expression. They are not numbered in the order of their capture.
- Empty capture groups may be referenced in a replacement expression.
Thus you could (but shouldn't) use
the search string
"(\n?)--(\n?)([^\s]*?)\s
" with the replacement string "--$3$1$2
" to raise an em-dash and the word that follows it. - Case-folding spans (\U \L ... \E) are not supported in replacement strings. Thus, the S/R provides no way to use a regular expression to change the case of characters returned within a match.
Some tips & hints:
- For clarity within a search string, consider specifying a literal space
character "
[ ]
". For clarity within a replacement string, consider specifying a literal space character by a capture reference. For example, instead of using the search string "{2,}
" with the replacement string "([ ]){2,}
" with the replacement string "$1
".
Some constraints:
- You can not refer to the result of a previous
Replace All
action. In other words, you can not use oneReplace All
to select some text from your current page and then use anotherReplace All
to make changes only in the selected text. - The S/R accepts input only from your keyboard. You must separately type or paste each search string and each replacement string.
- You must provide each search string and each replacement string separately. The S/R does not provide a way to enter a search string and a replacement string together (e.g., /search/replace/mode).
- The S/R can only handle one search & replace at a time. You can not save a sequence of regexes, say, in a file, and replay that sequence for each page you load as your current page. You also can not concatenate a sequence of search strings to work with a corresponding concatenated sequence of replacement strings.
Additional information about regular expressions in DP tools
Regular expression are also supported by Guiguts, a DP tool used for post-processing. Although Guiguts provides a different flavor of regular expressions, you may find information in these forums useful:
Technical notes
The S/R is written in javascript. The S/R uses the javascript string.replace
method
with the RegExp
object for its regular expression engine (see Jan Goyvaerts'
[http://www.regular-expressions.info/javascript.html discussion of the
javascript ECMA implementation]). Literal searches are handled by escaping all characters that
are meaningful to the regex engine. You can right-click on the S/R to view its source code.