Field Delimiters (GNU Datamash 1.8)

5.3 Field Delimiters

datamash uses tabs (ASCII character 0x09) as default field delimiters. Use -W to treat one or more consecutive whitespace characters as field delimiters. Use -t, --field-separator to set a custom field delimiter.

The following examples illustrate the various options.

By default, fields are separated by a single tab. Multiple tabs denotes multiple fields (this is consistent with GNU coreutils’ cut):

$ printf '1\t\t2\n' | datamash sum 3
2
$ printf '1\t\t2\n' | cut -f3
2

Every tab separates two fields. A line starting with a tab thus starts with an empty field, and a line ending with a tab ends with an empty field.

Using -W, one or more consecutive whitespace characters are treated as a single field delimiter:

$ printf '1  \t  2\n' | datamash -W sum 2
2
$ printf '1  \t  2\n' | datamash -W sum 3
datamash: invalid input: field 3 requested, line 1 has only 2 fields

With -W, leading whitespace is ignored, but trailing whitespace is significant. A line starting with one or more consecutive whitespace characters followed by a non-whitespace character starts with a non-empty field. A line ending with one or more consecutive whitespace characters ends with an empty field.

Using -t, a custom field delimiter character can be specified. Multiple consecutive delimiters are treated as multiple fields:

$ printf '1,10,,100\n' | datamash -t, sum 4
100