Header Lines and Column Names (GNU Datamash 1.8)

5.2 Header Lines and Column Names

Output Header Lines
Skipping Input Header Lines
Using Header Lines
Column Names

Output Header Lines

If the input does not have a header line, use --header-out to add a header in the first line of the output, indicating which operation was performed:

$ datamash --sort --header-out groupby 2 min  3 max 3 < scores.txt
GroupBy(field-2)  min(field-3)  max(field-3)
Arts              46            88
Business          79            94
Engineering       39            99
Health-Medicine   72           100
Life-Sciences     14            91
Social-Sciences   27            90

Skipping Input Header Lines

If the input has a header line (first line containing column names), use --header-in to skip the line:

$ cat scores_h.txt
Name      Major   Score
Shawn     Arts    65
Marques   Arts    58
Fernando  Arts    78
Paul      Arts    63
...


$ datamash --sort --header-in groupby 2 mean 3 < scores_h.txt
Arts             68.947
Business         87.363
Engineering      66.538
Health-Medicine  90.615
Life-Sciences    55.333
Social-Sciences  60.266

If the header line is not skipped, datamash will show an error (due to strict input validation):

$ datamash groupby 2 mean 3 < scores_h.txt
datamash: invalid numeric value in line 1 field 3: 'Score'

Using Header Lines

Column names in the input header lines can be printed in the output header lines by using --headers (or -H, both are equivalent to --header-in --header-out):

$ datamash --sort --headers groupby 2 mean 3 < scores_h.txt
GroupBy(Major)    mean(Score)
Arts              68.947
Business          87.363
Engineering       66.538
Health-Medicine   90.615
Life-Sciences     55.333
Social-Sciences   60.266

Or in short form (-sH instead of --sort --headers), equivalent to the above command:

$ datamash -sH groupby 2 mean 3

Column Names

When the input file has a header line, column names can be used instead of column numbers. In the example below, Major is used instead of the value 2, and Score is used instead of the value 3:

$ datamash --sort --headers groupby Major mean Score < scores_h.txt
GroupBy(Major)    mean(Score)
Arts              68.947
Business          87.363
Engineering       66.538
Health-Medicine   90.615
Life-Sciences     55.333
Social-Sciences   60.266

datamash will read the first line of the input, and deduce the correct column number based on the given name. If the column name is not found, an error will be printed:

$ datamash --sort --headers groupby 2 mean Foo  < scores_h.txt
datamash: column name 'Foo' not found in input file

Field names must be escaped with a backslash if they start with a digit or contain special characters (dash/minus, colons, commas). Note the interplay between escaping with backslash and shell quoting. The following equivalent command sum the values of a field named ‘FOO-BAR’:

$ datamash -H sum FOO\\-BAR < input.txt
$ datamash -H sum 'FOO\-BAR' < input.txt
$ datamash -H sum "FOO\\-BAR" < input.txt