User:Belastro/regex

From DPWiki
Jump to navigation Jump to search

Search/Replace popup tool

You can use the Search/Replace (S/R) tool to find and replace a string of characters in the page you are currently proofing (your current page). The S/R replaces every occurrance of a Search string with the corresponding Replace string. The S/R interprets your Search string as a regular expression (regex) when you check the Regular Expression? box; otherwise, the S/R treats your Search string as a literal string.

When you close your current page, the S/R tool remains open but stops working. You can restart it by clicking the proofing interface control that opens the S/R tool. This will clear and reset the S/R tool for your newly-opened page; in particular, regular expressions will be turned off. You will need to check the Regular Expression? box again if you are using regular expressions.

These characteristics hold for both literal and regex replacements:

  • You can not use the S/R to just find your Search string. The S/R always replaces your Search string with your Replace string. The S/R deletes your Search string from your current page when you do not provide a Replace string.
  • You can not step through your page to select specific matches to replace. The S/R always replaces your Search string wherever it appears in your current page.
  • The S/R always repositions your view to the top of your current page when you click either its Replace All button or its Undo button.
    • Your text cursor is relocated to the end of your current page when the S/R replaces any text.
    • Your text cursor remains where you left it when nothing is replaced.
  • The S/R enables its Undo button each time you click its Replace All button, whether or not the S/R makes any changes. In other words, the S/R enables its Undo button when it looks for your Search string, not when it finds your Search string.
  • The S/R does not tell you whether it has changed your current page and it does not highlight changes that it makes. You must inspect your current page from the top to determine whether any change has been made by the S/R.
  • The S/R is case sensitive. The S/R will not find "CASE" when it searches for "case".
Regular expressions

The S/R provides the ECMA flavor of regular expressions. Jan Goyvaerts' comparison of regex flavors details (and compares) the ECMA flavor for search strings. Goyvaerts' replacement text reference details (and compares) the ECMA flavor for replacement strings. The S/R does not add any features or extensions to those specified by the ECMA standard.

The S/R sets these modes, which you can not change:

  • Global mode is turned on. All matches within the current page are replaced.
  • Case insensitivity mode is turned off. Searches are case sensitive.
  • Multi-line mode is turned off. Carat (^) matches the beginning of your current page and dollar ($) matches at the end of your current page.

Some considerations:

  • You must use the newline (\n) metacharacter in an expression that looks for the beginning or end of a line within your current page.
  • The whitespace (\s) metacharacter matches a newline.
  • The dot (.) metacharacter does not match a newline.
  • The only way to get a newline into a replacement string is to capture a newline in the search string. For example, to insert a blank line after a paragraph you might search for "\.(\n)(\w)" and use ".$1$1$2" as the replacement.
  • The word (\w) metacharacter matches [a-zA-Z_0-9], that is, the ASCII letters, digits, and underscore. You must explicitly add letters with accents and diacritical marks to the \w set if such characters are letters in the text of your current page. For example, you might use "[\wáéíóúü]+?" to look for a word in a page of Spanish text. Conversely, the not-word (\W) metacharacter matches non-ASCII letters. However, you can not "subtract" such non-ASCII letters from the \W set.
  • The word-boundary (\b) metacharacter detects a transition between \w and \W character sets. You should not use \b in text that uses non-ASCII letters (e.g., Spanish text that uses [áéíóúü]).
  • The expressions "(X)?" and "(X?)", where X is a single character, are equivalent.
  • Capture groups are numbered in the order of their appearance in an expression. They are not numbered in the order of their capture.
  • Empty capture groups may be referenced in a replacement expression. Thus you could (but shouldn't) use the search string
    "(\n?)--(\n?)([^\s]*?)\s" with the replacement string "--$3$1$2" to raise an em-dash and the word that follows it.
  • Case-folding spans (\U \L ... \E) are not supported in replacement strings. Thus, the S/R provides no way to use a regular expression to change the case of characters returned within a match.

Some tips & hints:

  • For clarity within a search string, consider specifying a literal space character " " as a character class "[ ]". For clarity within a replacement string, consider specifying a literal space character by a capture reference. For example, instead of using the search string " {2,}" with the replacement string " " to remove extraneous blank characters, consider using the search string "([ ]){2,}" with the replacement string "$1".

Some constraints:

  • You can not refer to the result of a previous Replace All action. In other words, you can not use one Replace All to select some text from your current page and then use another Replace All to make changes only in the selected text.
  • The S/R accepts input only from your keyboard. You must separately type or paste each search string and each replacement string.
  • You must provide each search string and each replacement string separately. The S/R does not provide a way to enter a search string and a replacement string together (e.g., /search/replace/mode).
  • The S/R can only handle one search & replace at a time. You can not save a sequence of regexes, say, in a file, and replay that sequence for each page you load as your current page. You also can not concatenate a sequence of search strings to work with a corresponding concatenated sequence of replacement strings.
Additional information about regular expressions in DP tools

Regular expression are also supported by Guiguts, a DP tool used for post-processing. Although Guiguts provides a different flavor of regular expressions, you may find information in these forums useful:

Technical notes

The S/R is written in javascript. The S/R uses the javascript string.replace method with the RegExp object for its regular expression engine (see Jan Goyvaerts' [http://www.regular-expressions.info/javascript.html discussion of the javascript ECMA implementation]). Literal searches are handled by escaping all characters that are meaningful to the regex engine. You can right-click on the S/R to view its source code.