Escapes (sed, a stream editor)

Next: Multibyte characters and Locale Considerations, Previous: Back-references and Subexpressions, Up: Regular Expressions: selecting text [Contents][Index]

5.8 Escape Sequences - specifying special characters ¶

Until this chapter, we have only encountered escapes of the form ‘\^’, which tell sed not to interpret the circumflex as a special character, but rather to take it literally. For example, ‘\*’ matches a single asterisk rather than zero or more backslashes.

This chapter introduces another kind of escape⁶—that is, escapes that are applied to a character or sequence of characters that ordinarily are taken literally, and that sed replaces with a special character. This provides a way of encoding non-printable characters in patterns in a visible manner. There is no restriction on the appearance of non-printing characters in a sed script but when a script is being prepared in the shell or by text editing, it is usually easier to use one of the following escape sequences than the binary character it represents:

The list of these escapes is:

\a: Produces or matches a BEL character, that is an “alert” (ASCII 7).
\f: Produces or matches a form feed (ASCII 12).
\n: Produces or matches a newline (ASCII 10).
\r: Produces or matches a carriage return (ASCII 13).
\t: Produces or matches a horizontal tab (ASCII 9).
\v: Produces or matches a so called “vertical tab” (ASCII 11).
\cx: Produces or matches CONTROL-x, where x is any character. The precise effect of ‘\cx’ is as follows: if x is a lower case letter, it is converted to upper case. Then bit 6 of the character (hex 40) is inverted. Thus ‘\cz’ becomes hex 1A, but ‘\c{’ becomes hex 3B, while ‘\c;’ becomes hex 7B.
\dxxx: Produces or matches a character whose decimal ASCII value is xxx.
\oxxx: Produces or matches a character whose octal ASCII value is xxx.
\xxx: Produces or matches a character whose hexadecimal ASCII value is xx.

‘\b’ (backspace) was omitted because of the conflict with the existing “word boundary” meaning.

Escaping Precedence

5.8.1 Escaping Precedence ¶

GNU sed processes escape sequences before passing the text onto the regular-expression matching of the s/// command and address matching. Thus the following two commands are equivalent (‘0x5e’ is the hexadecimal ASCII value of the character ‘^’):

$ echo 'a^c' | sed 's/^/b/'
ba^c

$ echo 'a^c' | sed 's/\x5e/b/'
ba^c

As are the following (‘0x5b’,‘0x5d’ are the hexadecimal ASCII values of ‘[’,‘]’, respectively):

$ echo abc | sed 's/[a]/x/'
xbc
$ echo abc | sed 's/\x5ba\x5d/x/'
xbc

However it is recommended to avoid such special characters due to unexpected edge-cases. For example, the following are not equivalent:

$ echo 'a^c' | sed 's/\^/b/'
abc

$ echo 'a^c' | sed 's/\\\x5e/b/'
a^c

Footnotes

(6)

All the escapes introduced here are GNU extensions, with the exception of \n. In basic regular expression mode, setting POSIXLY_CORRECT disables them inside bracket expressions.