getline ¶Here are some miscellaneous points about getline that
you should bear in mind:
getline changes the value of $0 and NF,
awk does not automatically jump to the start of the
program and start testing the new record against every pattern.
However, the new record is tested against any subsequent rules.
awk implementations limit the number of pipelines that an awk
program may have open to just one. In gawk, there is no such limit.
You can open as many pipelines (and coprocesses) as the underlying operating
system permits.
getline without a
redirection inside a BEGIN rule. Because an unredirected getline
reads from the command-line data files, the first getline function
causes awk to set the value of FILENAME. Normally,
FILENAME does not have a value inside BEGIN rules, because you
have not yet started to process the command-line data files.
(d.c.)
(See The BEGIN and END Special Patterns;
also see Built-in Variables That Convey Information.)
FILENAME with getline
(‘getline < FILENAME’)
is likely to be a source of
confusion. awk opens a separate input stream from the
current input file. However, by not using a variable, $0
and NF are still updated. If you’re doing this, it’s
probably by accident, and you should reconsider what it is you’re
trying to accomplish.
getline Variants,
presents a table summarizing the
getline variants and which variables they can affect.
It is worth noting that those variants that do not use redirection
can cause FILENAME to be updated if they cause
awk to start reading a new input file.
getline is not a statement (unlike print), it’s an
expression. It has a result value, and can be used as part as a
larger expression, in control statements, and so on.
Here are examples of the “read until EOF/error” idiom:
while ("sort FILE" | getline line > 0)
print line
while (getline line < "file.txt" > 0)
print line
If you need to test the error code for being less than zero,
you need to enclose getline in parentheses, to avoid
it being interpreted as input redirection:
if ((getline VAR) < 0)
print "Read error";
It is, in fact, best to parenthesize calls to getline
in all control expressions, as some versions of awk
require this. Thus, the previous examples are best written
this way:
while (("sort FILE" | getline line) > 0)
print line
while ((getline line < "file.txt") > 0)
print line
awk behave differently upon encountering
end-of-file. Some versions don’t evaluate the expression; many versions
(including gawk) do. Here is an example, courtesy of Duncan Moore:
BEGIN {
system("echo 1 > f")
while ((getline a[++c] < "f") > 0) { }
print c
}
Here, the side effect is the ‘++c’. Is c incremented if
end-of-file is encountered before the element in a is assigned?
Despite the lack of parentheses when calling getline,
gawk evaluates
the expression ‘a[++c]’ before attempting to read from f.
However, some versions of awk only evaluate the expression once they
know that there is a string value to be assigned.