GNU Recutils: CSV Files

15.1 CSV Files

Many applications are able to read and write files containing so-called “comma separated values”. Such files generally contain tabular data where the columns are separated by commas and the rows by line feed and/or carriage return characters. Although record sets are not tables, tables can be easily emulated using records having the same fields in the same order. For example:

a: value
b: value
c: value

a: value
b: value
c: value

…

In several respects records are more flexible than tables:

- Fields can appear in a different order in several records.
- There can be several fields with the same name in a single record.
- Records can differ in the number of fields.

It is evident that records, such as those in recfiles, are a more general structure than comma separated values. This means that when converting from csv files to recfiles, certain decisions need to be made. The rec2csv utility (see Invoking rec2csv) implements an algorithm to deal with this problem and generate a table that the user expects.

The algorithm works as follows:

The utility first scans the specified record set, building a list with the names that will become the table header.
For each field, a header is added with the form:
```
FIELDNAME[_n]
```
where n is a number in the range 2..inf and is the “index” of the field in its containing record plus one. For example, consider the following record set:
```
a: a1
b: b11
b: b12
c: c1

a: a2
b: b2
d: d2
```
The corresponding list of headers being:
```
a b b_2 c a b d
```
Then duplicates are removed:
```
a b b_2 c d
```
The resulting list of headers is then used to build the table in the generated csv file.

In the above example the result would be

"a","b","b_2","c","d"
"a1","b11","b12","c1",
"a2","b2",,,"d2"

As shown, missing fields are implemented as empty columns in the generated csv.