Arrays of Arrays (The GNU Awk User’s Guide)

Next: Summary, Previous: Multidimensional Arrays, Up: Arrays in awk [Contents][Index]

8.6 Arrays of Arrays ¶

gawk goes beyond standard awk’s multidimensional array access and provides true arrays of arrays. Elements of a subarray are referred to by their own indices enclosed in square brackets, just like the elements of the main array. For example, the following creates a two-element subarray at index 1 of the main array a:

a[1][1] = 1
a[1][2] = 2

This simulates a true two-dimensional array. Each subarray element can contain another subarray as a value, which in turn can hold other arrays as well. In this way, you can create arrays of three or more dimensions. The indices can be any awk expressions, including scalars separated by commas (i.e., a regular awk simulated multidimensional subscript). So the following is valid in gawk:

a[1][3][1, "name"] = "barney"

Each subarray and the main array can be of different length. In fact, the elements of an array or its subarray do not all have to have the same type. This means that the main array and any of its subarrays can be nonrectangular, or jagged in structure. You can assign a scalar value to the index 4 of the main array a, even though a[1] is itself an array and not a scalar:

a[4] = "An element in a jagged array"

The terms dimension, row, and column are meaningless when applied to such an array, but we will use “dimension” henceforth to imply the maximum number of indices needed to refer to an existing element. The type of any element that has already been assigned cannot be changed by assigning a value of a different type. You have to first delete the current element, which effectively makes gawk forget about the element at that index:

delete a[4]
a[4][5][6][7] = "An element in a four-dimensional array"

This removes the scalar value from index 4 and then inserts a three-level nested subarray containing a scalar. You can also delete an entire subarray or subarray of subarrays:

delete a[4][5]
a[4][5] = "An element in subarray a[4]"

But recall that you can not delete the main array a and then use it as a scalar.

The built-in functions that take array arguments can also be used with subarrays. For example, the following code fragment uses length() (see String-Manipulation Functions) to determine the number of elements in the main array a and its subarrays:

print length(a), length(a[1]), length(a[1][3])

This results in the following output for our main array a:

2, 3, 1

The ‘subscript in array’ expression (see Referring to an Array Element) works similarly for both regular awk-style arrays and arrays of arrays. For example, the tests ‘1 in a’, ‘3 in a[1]’, and ‘(1, "name") in a[1][3]’ all evaluate to one (true) for our array a.

The ‘for (item in array)’ statement (see Scanning All Elements of an Array) can be nested to scan all the elements of an array of arrays if it is rectangular in structure. In order to print the contents (scalar values) of a two-dimensional array of arrays (i.e., in which each first-level element is itself an array, not necessarily of the same length), you could use the following code:

for (i in array)
    for (j in array[i])
        print array[i][j]

The isarray() function (see Getting Type Information) lets you test if an array element is itself an array:

for (i in array) {
    if (isarray(array[i])) {
        for (j in array[i]) {
            print array[i][j]
        }
    }
    else
        print array[i]
}

If the structure of a jagged array of arrays is known in advance, you can often devise workarounds using control statements. For example, the following code prints the elements of our main array a:

for (i in a) {
    for (j in a[i]) {
        if (j == 3) {
            for (k in a[i][j])
                print a[i][j][k]

        } else
            print a[i][j]
    }
}

See Traversing Arrays of Arrays for a user-defined function that “walks” an arbitrarily dimensioned array of arrays.