Lax Search (GNU Emacs Manual)

Next: Replacement Commands, Previous: Regular Expression Example, Up: Searching and Replacement [Contents][Index]

16.9 Lax Matching During Searching

Normally, you’d want search commands to disregard certain minor differences between the search string you type and the text being searched. For example, sequences of whitespace characters of different length are usually perceived as equivalent; letter-case differences usually don’t matter; etc. This is known as character equivalence.

This section describes the Emacs lax search features, and how to tailor them to your needs.

By default, search commands perform lax space matching: each space, or sequence of spaces, matches any sequence of one or more whitespace characters in the text. More precisely, Emacs matches each sequence of space characters in the search string to a regular expression specified by the user option search-whitespace-regexp. The default value of this option considers any sequence of spaces and tab characters as whitespace. Hence, ‘foo bar’ matches ‘foo bar’, ‘foo bar’, ‘foo bar’, and so on (but not ‘foobar’). If you want to make spaces match sequences of newlines as well as spaces and tabs, customize the option to make its value be the regular expression ‘[ \t\n]+’. (The default behavior of the incremental regexp search is different; see Regular Expression Search.)

If you want whitespace characters to match exactly, you can turn lax space matching off by typing M-s SPC (isearch-toggle-lax-whitespace) within an incremental search. Another M-s SPC turns lax space matching back on. To disable lax whitespace matching for all searches, change search-whitespace-regexp to nil; then each space in the search string matches exactly one space.

Searches in Emacs by default ignore the case of the text they are searching through, if you specify the search string in lower case. Thus, if you specify searching for ‘foo’, then ‘Foo’ and ‘fOO’ also match. Regexps, and in particular character sets, behave likewise: ‘[ab]’ matches ‘a’ or ‘A’ or ‘b’ or ‘B’. This feature is known as case folding, and it is supported in both incremental and non-incremental search modes.

An upper-case letter anywhere in the search string makes the search case-sensitive. Thus, searching for ‘Foo’ does not find ‘foo’ or ‘FOO’. This applies to regular expression search as well as to literal string search. The effect ceases if you delete the upper-case letter from the search string. The variable search-upper-case controls this: if it is non-nil, an upper-case character in the search string makes the search case-sensitive; setting it to nil disables this effect of upper-case characters. The default value of this variable is not-yanks, which makes search case-sensitive if there are upper-case letters in the search string, and also causes text yanked into the search string (see Isearch Yanking) to be down-cased, so that such searches are case-insensitive by default.

If you set the variable case-fold-search to nil, then all letters must match exactly, including case. This is a per-buffer variable; altering the variable normally affects only the current buffer, unless you change its default value. See Local Variables. This variable applies to nonincremental searches also, including those performed by the replace commands (see Replacement Commands) and the minibuffer history matching commands (see Minibuffer History).

Typing M-c or M-s c (isearch-toggle-case-fold) within an incremental search toggles the case sensitivity of that search. The effect does not extend beyond the current incremental search, but it does override the effect of adding or removing an upper-case letter in the current search.

Several related variables control case-sensitivity of searching and matching for specific commands or activities. For instance, tags-case-fold-search controls case sensitivity for find-tag. To find these variables, do M-x apropos-variable RET case-fold-search RET.

Case folding disregards case distinctions among characters, making upper-case characters match lower-case variants, and vice versa. A generalization of case folding is character folding, which disregards wider classes of distinctions among similar characters. For instance, under character folding the letter a matches all of its accented cousins like ä and á, i.e., the match disregards the diacritics that distinguish these variants. In addition, a matches other characters that resemble it, or have it as part of their graphical representation, such as U+00AA FEMININE ORDINAL INDICATOR and U+24D0 CIRCLED LATIN SMALL LETTER A (which looks like a small a inside a circle). Similarly, the ASCII double-quote character " matches all the other variants of double quotes defined by the Unicode standard. Finally, character folding can make a sequence of one or more characters match another sequence of a different length: for example, the sequence of two characters ff matches U+FB00 LATIN SMALL LIGATURE FF and the sequence (a) matches U+249C PARENTHESIZED LATIN SMALL LETTER A. Character sequences that are not identical, but match under character folding are known as equivalent character sequences.

Generally, search commands in Emacs do not by default perform character folding in order to match equivalent character sequences. You can enable this behavior by customizing the variable search-default-mode to char-fold-to-regexp. See Tailoring Search to Your Needs. Within an incremental search, typing M-s ' (isearch-toggle-char-fold) toggles character folding, but only for that search. (Replace commands have a different default, controlled by a separate option; see Replace Commands and Lax Matches.)

By default, typing an explicit variant of a character, such as ä, as part of the search string doesn’t match its base character, such as a. But if you customize the variable char-fold-symmetric to t, then search commands treat equivalent characters the same and use of any of a set of equivalent characters in a search string finds any of them in the text being searched, so typing an accented character ä matches the letter a as well as all the other variants like á.

You can add new foldings using the customizable variable char-fold-include, or remove the existing ones using the customizable variable char-fold-exclude. You can also customize char-fold-override to t to disable all the character equivalences except those you add yourself using char-fold-include.