gawk
¶This subsection describes a feature that is specific to gawk
.
By default, when a for
loop traverses an array, the order
is undefined, meaning that the awk
implementation
determines the order in which the array is traversed.
This order is usually based on the internal implementation of arrays
and will vary from one version of awk
to the next.
Often, though, you may wish to do something simple, such as
“traverse the array by comparing the indices in ascending order,”
or “traverse the array by comparing the values in descending order.”
gawk
provides two mechanisms that give you this control:
PROCINFO["sorted_in"]
to one of a set of predefined values.
We describe this now.
PROCINFO["sorted_in"]
to the name of a user-defined function
to use for comparison of array elements. This advanced feature
is described later in Controlling Array Traversal and Array Sorting.
The following special values for PROCINFO["sorted_in"]
are available:
"@unsorted"
Array elements are processed in arbitrary order, which is the default
awk
behavior.
"@ind_str_asc"
Order by indices in ascending order compared as strings; this is the most basic sort.
(Internally, array indices are always strings, so with ‘a[2*5] = 1’
the index is "10"
rather than numeric 10.)
"@ind_num_asc"
Order by indices in ascending order but force them to be treated as numbers in the process. Any index with a non-numeric value will end up positioned as if it were zero.
"@val_type_asc"
Order by element values in ascending order (rather than by indices). Ordering is by the type assigned to the element (see Variable Typing and Comparison Expressions). All numeric values come before all string values, which in turn come before all subarrays. (Subarrays have not been described yet; see Arrays of Arrays.)
If you choose to use this feature in traversing FUNCTAB
(see Built-in Variables That Convey Information), then the order is built-in functions first
(see Built-in Functions), then user-defined functions (see User-Defined Functions)
next, and finally functions loaded from an extension
(see Writing Extensions for gawk
).
"@val_str_asc"
Order by element values in ascending order (rather than by indices). Scalar values are
compared as strings.
If the string values are identical,
the index string values are compared instead.
When comparing non-scalar values,
"@val_type_asc"
sort ordering is used, so subarrays, if present,
come out last.
"@val_num_asc"
Order by element values in ascending order (rather than by indices). Scalar values are
compared as numbers.
Non-scalar values are compared using "@val_type_asc"
sort ordering,
so subarrays, if present, come out last.
When numeric values are equal, the string values are used to provide
an ordering: this guarantees consistent results across different
versions of the C qsort()
function,45 which gawk
uses internally
to perform the sorting.
If the string values are also identical,
the index string values are compared instead.
"@ind_str_desc"
Like "@ind_str_asc"
, but the
string indices are ordered from high to low.
"@ind_num_desc"
Like "@ind_num_asc"
, but the
numeric indices are ordered from high to low.
"@val_type_desc"
Like "@val_type_asc"
, but the
element values, based on type, are ordered from high to low.
Subarrays, if present, come out first.
"@val_str_desc"
Like "@val_str_asc"
, but the
element values, treated as strings, are ordered from high to low.
If the string values are identical,
the index string values are compared instead.
When comparing non-scalar values,
"@val_type_desc"
sort ordering is used, so subarrays, if present,
come out first.
"@val_num_desc"
Like "@val_num_asc"
, but the
element values, treated as numbers, are ordered from high to low.
If the numeric values are equal, the string values are compared instead.
If they are also identical, the index string values are compared instead.
Non-scalar values are compared using "@val_type_desc"
sort ordering,
so subarrays, if present, come out first.
The array traversal order is determined before the for
loop
starts to run. Changing PROCINFO["sorted_in"]
in the loop body
does not affect the loop.
For example:
$ gawk ' > BEGIN { > a[4] = 4 > a[3] = 3 > for (i in a) > print i, a[i] > }' -| 4 4 -| 3 3 $ gawk ' > BEGIN { > PROCINFO["sorted_in"] = "@ind_str_asc" > a[4] = 4 > a[3] = 3 > for (i in a) > print i, a[i] > }' -| 3 3 -| 4 4
When sorting an array by element values, if a value happens to be a subarray then it is considered to be greater than any string or numeric value, regardless of what the subarray itself contains, and all subarrays are treated as being equal to each other. Their order relative to each other is determined by their index strings.
Here are some additional things to bear in mind about sorted array traversal:
PROCINFO["sorted_in"]
is global. That is, it affects
all array traversal for
loops. If you need to change it within your
own code, you should see if it’s defined and save and restore the value:
... if ("sorted_in" in PROCINFO) save_sorted = PROCINFO["sorted_in"] PROCINFO["sorted_in"] = "@val_str_desc" # or whatever ... if (save_sorted) PROCINFO["sorted_in"] = save_sorted
"@unsorted"
. You can also get the default behavior by assigning
the null string to PROCINFO["sorted_in"]
or by just deleting the
"sorted_in"
element from the PROCINFO
array with
the delete
statement.
(The delete
statement hasn’t been described yet; see The delete
Statement.)
In addition, gawk
provides built-in functions for
sorting arrays; see Sorting Array Values and Indices with gawk
.
When two elements
compare as equal, the C qsort()
function does not guarantee
that they will maintain their original relative order after sorting.
Using the string value to provide a unique ordering when the numeric
values are equal ensures that gawk
behaves consistently
across different environments.