Some characters cannot be included literally in string constants
("foo"
) or regexp constants (/foo/
).
Instead, they should be represented with escape sequences,
which are character sequences beginning with a backslash (‘\’).
One use of an escape sequence is to include a double quote character in
a string constant. Because a plain double quote ends the string, you
must use ‘\"’ to represent an actual double quote character as a
part of the string. For example:
$ awk 'BEGIN { print "He said \"hi!\" to her." }' -| He said "hi!" to her.
The backslash character itself is another character that cannot be
included normally; you must write ‘\\’ to put one backslash in the
string or regexp. Thus, the string whose contents are the two characters
‘"’ and ‘\’ must be written "\"\\"
.
Other escape sequences represent unprintable characters such as TAB or newline. There is nothing to stop you from entering most unprintable characters directly in a string constant or regexp constant, but they may look ugly.
The following list presents
all the escape sequences used in awk
and
what they represent. Unless noted otherwise, all these escape
sequences apply to both string constants and regexp constants:
\\
A literal backslash, ‘\’.
\a
The “alert” character, Ctrl-g, ASCII code 7 (BEL). (This often makes some sort of audible noise.)
\b
Backspace, Ctrl-h, ASCII code 8 (BS).
\f
Formfeed, Ctrl-l, ASCII code 12 (FF).
\n
Newline, Ctrl-j, ASCII code 10 (LF).
\r
Carriage return, Ctrl-m, ASCII code 13 (CR).
\t
Horizontal TAB, Ctrl-i, ASCII code 9 (HT).
\v
Vertical TAB, Ctrl-k, ASCII code 11 (VT).
\nnn
The octal value nnn, where nnn stands for 1 to 3 digits between ‘0’ and ‘7’. For example, the code for the ASCII ESC (escape) character is ‘\033’.
\xhh…
The hexadecimal value hh, where hh stands for a sequence of hexadecimal digits (‘0’–‘9’, and either ‘A’–‘F’ or ‘a’–‘f’). A maximum of two digits are allowed after the ‘\x’. Any further hexadecimal digits are treated as simple letters or numbers. (c.e.) (The ‘\x’ escape sequence is not allowed in POSIX awk.)
CAUTION: In ISO C, the escape sequence continues until the first nonhexadecimal digit is seen. For many years,
gawk
would continue incorporating hexadecimal digits into the value until a non-hexadecimal digit or the end of the string was encountered. However, using more than two hexadecimal digits produced undefined results. As of version 4.2, only two digits are processed.
\uhh…
The hexadecimal value hh, where hh stands for a sequence of one or more hexadecimal digits (‘0’–‘9’, and either ‘A’–‘F’ or ‘a’–‘f’). A maximum of eight digits are allowed after the ‘\u’. Any further hexadecimal digits are treated as simple letters or numbers. (c.e.) (The ‘\u’ escape sequence is not allowed in POSIX awk.)
This escape sequence is intended for designating a character in
the current locale’s character set.16 gawk
first
converts the given digits into an integer and then translates the given
“wide character” value into the current locale’s multibyte encoding. If
the wide character value does not represent a valid character, or if
the character is valid but cannot be encoded into the current locale’s
multibyte encoding, the value becomes "?"
. gawk
issues
a warning message when this happens.
\/
A literal slash (should be used for regexp constants only).
This sequence is used when you want to write a regexp
constant that contains a slash
(such as /.*:\/home\/[[:alnum:]]+:.*/
; the ‘[[:alnum:]]’
notation is discussed in Using Bracket Expressions).
Because the regexp is delimited by
slashes, you need to escape any slash that is part of the pattern,
in order to tell awk
to keep processing the rest of the regexp.
\"
A literal double quote (should be used for string constants only).
This sequence is used when you want to write a string
constant that contains a double quote
(such as "He said \"hi!\" to her."
).
Because the string is delimited by
double quotes, you need to escape any double quote that is part of the string,
in order to tell awk
to keep processing the rest of the string.
In gawk
, a number of additional two-character sequences that begin
with a backslash have special meaning in regexps.
See gawk
-Specific Regexp Operators.
In a regexp, a backslash before any character that is not in the previous list
and not listed in
gawk
-Specific Regexp Operators
means that the next character should be taken literally, even if it would
normally be a regexp operator. For example, /a\+b/
matches the three
characters ‘a+b’.
For complete portability, do not use a backslash before any character not shown in the previous list or that is not an operator.
Backslash Before Regular Characters |
---|
If you place a backslash in a string constant before something that is
not one of the characters previously listed, POSIX
|
To summarize:
awk
reads your program.
gawk
processes both regexp constants and dynamic regexps
(see Using Dynamic Regexps),
for the special operators listed in
gawk
-Specific Regexp Operators.
Escape Sequences for Metacharacters |
---|
Suppose you use an octal or hexadecimal ( Historically, such characters were taken literally.
(d.c.)
However, the POSIX standard indicates that they should be treated
as real metacharacters, which is what |