Next: Locale Considerations, Previous: Back-references and Subexpressions, Up: sed regular expressions [Contents][Index]
Until this chapter, we have only encountered escapes of the form
‘\^’, which tell sed not to interpret the circumflex
as a special character, but rather to take it literally.  For
example, ‘\*’ matches a single asterisk rather than zero
or more backslashes.
This chapter introduces another kind of escape6—that
is, escapes that are applied to a character or sequence of characters
that ordinarily are taken literally, and that sed replaces
with a special character.  This provides a way
of encoding non-printable characters in patterns in a visible manner.
There is no restriction on the appearance of non-printing characters
in a sed script but when a script is being prepared in the
shell or by text editing, it is usually easier to use one of
the following escape sequences than the binary character it
represents:
The list of these escapes is:
\aProduces or matches a BEL character, that is an “alert” (ASCII 7).
\fProduces or matches a form feed (ASCII 12).
\nProduces or matches a newline (ASCII 10).
\rProduces or matches a carriage return (ASCII 13).
\tProduces or matches a horizontal tab (ASCII 9).
\vProduces or matches a so called “vertical tab” (ASCII 11).
\cxProduces or matches CONTROL-x, where x is any character. The precise effect of ‘\cx’ is as follows: if x is a lower case letter, it is converted to upper case. Then bit 6 of the character (hex 40) is inverted. Thus ‘\cz’ becomes hex 1A, but ‘\c{’ becomes hex 3B, while ‘\c;’ becomes hex 7B.
\dxxxProduces or matches a character whose decimal ASCII value is xxx.
\oxxxProduces or matches a character whose octal ASCII value is xxx.
\xxxProduces or matches a character whose hexadecimal ASCII value is xx.
‘\b’ (backspace) was omitted because of the conflict with the existing “word boundary” meaning.
GNU sed processes escape sequences before passing
the text onto the regular-expression matching of the s/// command
and Address matching. Thus the follwing two commands are equivalent
(‘0x5e’ is the hexadecimal ASCII value of the character ‘^’):
$ echo 'a^c' | sed 's/^/b/' ba^c $ echo 'a^c' | sed 's/\x5e/b/' ba^c
As are the following (‘0x5b’,‘0x5d’ are the hexadecimal ASCII values of ‘[’,‘]’, respectively):
$ echo abc | sed 's/[a]/x/' Xbc $ echo abc | sed 's/\x5ba\x5d/x/' Xbc
However it is recommended to avoid such special characters due to unexpected edge-cases. For example, the following are not equivalent:
$ echo 'a^c' | sed 's/\^/b/' abc $ echo 'a^c' | sed 's/\\\x5e/b/' a^c
All
the escapes introduced here are GNU
extensions, with the exception of \n.  In basic regular
expression mode, setting POSIXLY_CORRECT disables them inside
bracket expressions.
Next: Locale Considerations, Previous: Back-references and Subexpressions, Up: sed regular expressions [Contents][Index]