Previous: Multi-dimensional, Up: Arrays


8.6 Arrays of Arrays

gawk goes beyond standard awk's multidimensional array access and provides true arrays of arrays. Elements of a subarray are referred to by their own indices enclosed in square brackets, just like the elements of the main array. For example, the following creates a two-element subarray at index ‘1’ of the main array a:

     a[1][1] = 1
     a[1][2] = 2

This simulates a true two-dimensional array. Each subarray element can contain another subarray as a value, which in turn can hold other arrays as well. In this way, you can create arrays of three or more dimensions. The indices can be any awk expression, including scalars separated by commas (that is, a regular awk simulated multidimensional subscript). So the following is valid in gawk:

     a[1][3][1, "name"] = "barney"

Each subarray and the main array can be of different length. In fact, the elements of an array or its subarray do not all have to have the same type. This means that the main array and any of its subarrays can be non-rectangular, or jagged in structure. One can assign a scalar value to the index ‘4’ of the main array a:

     a[4] = "An element in a jagged array"

The terms dimension, row and column are meaningless when applied to such an array, but we will use “dimension” henceforth to imply the maximum number of indices needed to refer to an existing element. The type of any element that has already been assigned cannot be changed by assigning a value of a different type. You have to first delete the current element, which effectively makes gawk forget about the element at that index:

     delete a[4]
     a[4][5][6][7] = "An element in a four-dimensional array"

This removes the scalar value from index ‘4’ and then inserts a subarray of subarray of subarray containing a scalar. You can also delete an entire subarray or subarray of subarrays:

     delete a[4][5]
     a[4][5] = "An element in subarray a[4]"

But recall that you can not delete the main array a and then use it as a scalar.

The built-in functions which take array arguments can also be used with subarrays. For example, the following code fragment uses length() (see String Functions) to determine the number of elements in the main array a and its subarrays:

     print length(a), length(a[1]), length(a[1][3])

This results in the following output for our main array a:

     2, 3, 1

The ‘subscript in array’ expression (see Reference to Elements) works similarly for both regular awk-style arrays and arrays of arrays. For example, the tests ‘1 in a’, ‘3 in a[1]’, and ‘(1, "name") in a[1][3]’ all evaluate to one (true) for our array a.

The ‘for (item in array)’ statement (see Scanning an Array) can be nested to scan all the elements of an array of arrays if it is rectangular in structure. In order to print the contents (scalar values) of a two-dimensional array of arrays (i.e., in which each first-level element is itself an array, not necessarily of the same length) you could use the following code:

     for (i in array)
         for (j in array[i])
             print array[i][j]

The isarray() function (see Type Functions) lets you test if an array element is itself an array:

     for (i in array) {
         if (isarray(array[i]) {
             for (j in array[i]) {
                 print array[i][j]
             }
         }
     }

If the structure of a jagged array of arrays is known in advance, you can often devise workarounds using control statements. For example, the following code prints the elements of our main array a:

     for (i in a) {
         for (j in a[i]) {
             if (j == 3) {
                 for (k in a[i][j])
                     print a[i][j][k]
             } else
                 print a[i][j]
         }
     }

See Walking Arrays, for a user-defined function that “walks” an arbitrarily-dimensioned array of arrays.

Recall that a reference to an uninitialized array element yields a value of "", the null string. This has one important implication when you intend to use a subarray as an argument to a function, as illustrated by the following example:

     $ gawk 'BEGIN { split("a b c d", b[1]); print b[1][1] }'
     error--> gawk: cmd. line:1: fatal: split: second argument is not an array

The way to work around this is to first force b[1] to be an array by creating an arbitrary index:

     $ gawk 'BEGIN { b[1][1] = ""; split("a b c d", b[1]); print b[1][1] }'
     -| a