Next: , Previous: , Up: Datamash   [Contents][Index]

2 Invoking datamash

The format for running the datamash program is:

datamash [option]… op1 column1  [op2 column2 …]

Where op1 is the operation to perform on the values in column1. datamash reads input from stdin and performs one or more operations on the input data. If --group is used, each operation is performed on every group. If --group is not used, each operation is performed on all the values in the input file.

The LC_NUMERIC locale specifies the decimal-point character and the thousands separator.

datamash supports the following operations:

Primary operations:

groupby, crosstab, transpose, reverse, check

Line-Filtering operations:


Per-Line operations:

base64, debase64, md5, sha1, sha224, sha256, sha384, sha512, bin, strbin, round, floor, ceil, trunc, frac, dirname, basename, extname, barename, getnum, cut, echo

Group-by Numeric operations:

sum, min, max, absmin, absmax, range

Group-by Textual/Numeric operations:

count, first, last, rand, unique, uniq, collapse, countunique

Group-by Statistical operations:

mean, geomean, harmmean, mode, median, q1, q3, iqr, perc, antimode, pstdev, sstdev, pvar, svar, ms, rms, mad, madraw, sskew, pskew, skurt, pkurt, jarque, dpo, scov, pcov, spearson, ppearson

Grouping options:


Skip comment lines (starting with ’#’ or ’;’ and optional whitespace).


Print entire input line before op results (default: print only the grouped keys). While using this option with non-linewise operations was historically permitted, it never produced very sensible output. Such usage has been deprecated, and in a future release it will result in an error.

-g X[,Y,X]

Group input via fields X[,Y,Z]. By default, fields are separated by TABs. Use --field-separator to change the delimiter character. Input file must be sorted by the same fields X[,Y,Z]. Use --sort to automatically sort the input. If --group is not specified, each operation is performed in the entire input file.


Indicates the first input line is column headers, and should not be used for any calculations.


Print column headers as first line. If the column header names are known (i.e. the input file had a header line, and the command was invoked with --header-in, -H or --headers), prints the operation and the name of the field (e.g. ‘mean(X)’). Otherwise, prints the number operation and the field number (e.g. ‘mean(field-3)’).


Same as ‘--header-in --header-out’. A short option indicating the input file has a header line, and the output should contain a header line as well.


Ignore upper/lower case when comparing text for grouping, sorting, and comparing unique values in the ‘countunique’ and ‘unique’ (or ‘uniq’) operations.


Sort the input before grouping. datamash requires sorted input. If the input is not sorted, using --sort will automatically sort the input before processing it further. Sorting will be performed based on the specified --group parameter, and respecting case --ignore-case option (if used). The following commands are equivalent:

$ cat FILE | sort -k1,1 | datamash --group 1 sum 1
$ cat FILE | datamash --sort --group 1 sum 1

Use the given program to sort instead of the system sort

File Operation options:


Allow lines with varying number of fields. By default, transpose and reverse will fail with an error message unless all input lines have the same number of fields.


When use --no-strict option, missing fields will be filled with this value.

General options:


print numeric values with printf style floating-point FORMAT.

-t x

Use character X instead of TAB as input and output field delimiter. If --output-delimiter is also used, it will override the output field delimiter.


Skip NA or NaN values.


Use character X instead as output field delimiter. This option overrides --field-separator/-t/ --whitespace/-W.

-c x

Use character X instead of comma to delimit items in a ‘collapse’ or ‘unique’ (aka ‘uniq’) list.

-R N

Round numeric output to N decimal places.


Use whitespace (one or more spaces and/or tabs) for field delimiters. Leading whitespace is ignored, trailing whitespace results in an empty field. TAB character will be used as output field separator. If --output-delimiter is also used, it will override the output field delimiter.


End lines with a 0 byte, not newline.


Print an informative help message on standard output and exit successfully.


Print the version number and licensing information of Datamash on standard output and then exit successfully.

Next: Available operations in datamash, Previous: Overview, Up: Datamash   [Contents][Index]