10.2.11 Producing CSV Data

gawk’s --csv option causes gawk to process CSV data (see Working With Comma Separated Value Files).

But what if you have regular data that you want to output in CSV format? This section provides functions for doing that.

The first function, tocsv(), takes an array of data fields as input. The array should be indexed starting from one. The optional second parameter is the separator to use. If none is supplied, the default is a comma.

The function takes care to quote fields that contain double quotes, newlines, or the separator character. It then builds up the final CSV record and returns it.

# tocsv.awk --- convert data to CSV format

function tocsv(fields, sep,     i, j, nfields, result)
{
    if (length(fields) == 0)
        return ""

    if (sep == "")
        sep = ","
    delete nfields
    for (i = 1; i in fields; i++) {
        nfields[i] = fields[i]
        if (nfields[i] ~ /["\n]/ || index(nfields[i], sep) != 0) {
            gsub(/"/, "\"\"", nfields[i])       # double up quotes
            nfields[i] = "\"" nfields[i] "\""   # wrap in quotes
        }
    }

    result = nfields[1]
    j = length(nfields)
    for (i = 2; i <= j; i++)
        result = result sep nfields[i]

    return result
}

The next function, tocsv_rec() is a wrapper around tocsv(). Its intended use is for when you want to convert the current input record to CSV format. The function itself simply copies the fields into an array to pass to tocsv() which does the work. It accepts an optional separator character as its first parameter, which it simply passes on to tocsv().

function tocsv_rec(sep,     i, fields)
{
    delete fields
    for (i = 1; i <= NF; i++)
        fields[i] = $i

    return tocsv(fields, sep)
}