Previous: Scanning an Array, Up: Array Basics


8.1.6 Using Predefined Array Scanning Orders

By default, when a for loop traverses an array, the order is undefined, meaning that the awk implementation determines the order in which the array is traversed. This order is usually based on the internal implementation of arrays and will vary from one version of awk to the next.

Often, though, you may wish to do something simple, such as “traverse the array by comparing the indices in ascending order,” or “traverse the array by comparing the values in descending order.” gawk provides two mechanisms which give you this control.

The following special values for PROCINFO["sorted_in"] are available:

"@unsorted"
Array elements are processed in arbitrary order, which is the default awk behavior.
"@ind_str_asc"
Order by indices compared as strings; this is the most basic sort. (Internally, array indices are always strings, so with ‘a[2*5] = 1’ the index is "10" rather than numeric 10.)
"@ind_num_asc"
Order by indices but force them to be treated as numbers in the process. Any index with a non-numeric value will end up positioned as if it were zero.
"@val_type_asc"
Order by element values rather than indices. Ordering is by the type assigned to the element (see Typing and Comparison). All numeric values come before all string values, which in turn come before all subarrays. (Subarrays have not been described yet; see Arrays of Arrays).
"@val_str_asc"
Order by element values rather than by indices. Scalar values are compared as strings. Subarrays, if present, come out last.
"@val_num_asc"
Order by element values rather than by indices. Scalar values are compared as numbers. Subarrays, if present, come out last. When numeric values are equal, the string values are used to provide an ordering: this guarantees consistent results across different versions of the C qsort() function,1 which gawk uses internally to perform the sorting.
"@ind_str_desc"
Reverse order from the most basic sort.
"@ind_num_desc"
Numeric indices ordered from high to low.
"@val_type_desc"
Element values, based on type, in descending order.
"@val_str_desc"
Element values, treated as strings, ordered from high to low. Subarrays, if present, come out first.
"@val_num_desc"
Element values, treated as numbers, ordered from high to low. Subarrays, if present, come out first.

The array traversal order is determined before the for loop starts to run. Changing PROCINFO["sorted_in"] in the loop body does not affect the loop. For example:

     $ gawk 'BEGIN {
     >    a[4] = 4
     >    a[3] = 3
     >    for (i in a)
     >        print i, a[i]
     > }'
     -| 4 4
     -| 3 3
     $ gawk 'BEGIN {
     >    PROCINFO["sorted_in"] = "@ind_str_asc"
     >    a[4] = 4
     >    a[3] = 3
     >    for (i in a)
     >        print i, a[i]
     > }'
     -| 3 3
     -| 4 4

When sorting an array by element values, if a value happens to be a subarray then it is considered to be greater than any string or numeric value, regardless of what the subarray itself contains, and all subarrays are treated as being equal to each other. Their order relative to each other is determined by their index strings.

Here are some additional things to bear in mind about sorted array traversal.

In addition, gawk provides built-in functions for sorting arrays; see Array Sorting Functions.


Footnotes

[1] When two elements compare as equal, the C qsort() function does not guarantee that they will maintain their original relative order after sorting. Using the string value to provide a unique ordering when the numeric values are equal ensures that gawk behaves consistently across different environments.