It’s entirely fair to say that the awk syntax for local variable definitions is appallingly awful.
Definitions of functions can appear anywhere between the rules of an
awk
program. Thus, the general form of an awk
program is
extended to include sequences of rules and user-defined function
definitions.
There is no need to put the definition of a function
before all uses of the function. This is because awk
reads the
entire program before starting to execute any of it.
The definition of a function named name looks like this:
function
name(
[parameter-list])
{
body-of-function}
Here, name is the name of the function to define. A valid function
name is like a valid variable name: a sequence of letters, digits, and
underscores that doesn’t start with a digit.
Here too, only the 52 upper- and lowercase English letters may
be used in a function name.
Within a single awk
program, any particular name can only be
used as a variable, array, or function.
parameter-list is an optional list of the function’s arguments and local variable names, separated by commas. When the function is called, the argument names are used to hold the argument values given in the call.
A function cannot have two parameters with the same name, nor may it have a parameter with the same name as the function itself.
CAUTION: According to the POSIX standard, function parameters cannot have the same name as one of the special predefined variables (see Predefined Variables), nor may a function parameter have the same name as another function.
Not all versions of
awk
enforce these restrictions. (d.c.)gawk
always enforces the first restriction. With --posix (see Command-Line Options), it also enforces the second restriction.
Local variables act like the empty string if referenced where a string value is required, and like zero if referenced where a numeric value is required. This is the same as the behavior of regular variables that have never been assigned a value. (There is more to understand about local variables; see Functions and Their Effects on Variable Typing.)
The body-of-function consists of awk
statements. It is the
most important part of the definition, because it says what the function
should actually do. The argument names exist to give the body a
way to talk about the arguments; local variables exist to give the body
places to keep temporary values.
Argument names are not distinguished syntactically from local variable names. Instead, the number of arguments supplied when the function is called determines how many argument variables there are. Thus, if three argument values are given, the first three names in parameter-list are arguments and the rest are local variables.
It follows that if the number of arguments is not the same in all calls to the function, some of the names in parameter-list may be arguments on some occasions and local variables on others. Another way to think of this is that omitted arguments default to the null string.
Usually when you write a function, you know how many names you intend to use for arguments and how many you intend to use as local variables. It is conventional to place some extra space between the arguments and the local variables, in order to document how your function is supposed to be used.
During execution of the function body, the arguments and local variable
values hide, or shadow, any variables of the same names used in the
rest of the program. The shadowed variables are not accessible in the
function definition, because there is no way to name them while their
names have been taken away for the arguments and local variables. All other variables
used in the awk
program can be referenced or set normally in the
function’s body.
The arguments and local variables last only as long as the function body is executing. Once the body finishes, you can once again access the variables that were shadowed while the function was running.
The function body can contain expressions that call functions. They can even call this function, either directly or by way of another function. When this happens, we say the function is recursive. The act of a function calling itself is called recursion.
All the built-in functions return a value to their caller.
User-defined functions can do so also, using the return
statement,
which is described in detail in The return
Statement.
Many of the subsequent examples in this section use
the return
statement.
In many awk
implementations, including gawk
,
the keyword function
may be
abbreviated func
. (c.e.)
However, POSIX only specifies the use of
the keyword function
. This actually has some practical implications.
If gawk
is in POSIX-compatibility mode
(see Command-Line Options), then the following
statement does not define a function:
func foo() { a = sqrt($1) ; print a }
Instead, it defines a rule that, for each record, concatenates the value
of the variable ‘func’ with the return value of the function ‘foo’.
If the resulting string is non-null, the action is executed.
This is probably not what is desired. (awk
accepts this input as
syntactically valid, because functions may be used before they are defined
in awk
programs.64)
To ensure that your awk
programs are portable, always use the
keyword function
when defining a function.