Next: , Previous: , Up: Top   [Contents][Index]

2 Invoking datamash

The format for running the datamash program is:

datamash [option]… op1 column1  [op2 column2 …]

Where op1 is the operation to perform on the values in column1. datamash reads input from stdin and performs one or more operations on the input data. If --group is used, each operation is performed on every group. If --group is not used, each operation is performed on all the values in the input file.

datamash supports the following operations:

Primary operations:

groupby, crosstab, transpose, reverse, check

Line-Filtering operations:


Per-Line operations:

base64, debase64, md5, sha1, sha256, sha512, bin, strbin, round, floor, ceil, trunc, frac

Group-by Numeric operations:

sum, min, max, absmin, absmax, range

Group-by Textual/Numeric operations:

count, first, last, rand, unique, collapse, countunique

Group-by Statistical operations:

mean, mode, median, q1, q3, iqr, perc, antimode, pstdev, sstdev, pvar, svar, mad, madraw, sskew, pskew, skurt, pkurt, jarque, dpo, scov, pcov, spearson, ppearson

Grouping options:


Print entire input line before op results (default: print only the grouped keys).

-g X[,Y,X]

Group input via fields X[,Y,Z]. By default, fields are separated by TABs. Use --field-separator to change the delimiter character. Input file must be sorted by the same fields X[,Y,Z]. Use --sort to automatically sort the input. If --group is not specified, each operation is performed in the entire input file.


Indicates the first input line is column headers, and should not be used for any calculations.


Print column headers as first line. If the column header names are known (i.e. the input file had a header line, and the command was invoked with --header-in, -H or --headers), prints the operation and the name of the field (e.g. ‘mean(X)’). Otherwise, prints the number operation and the field number (e.g. ‘mean(field-3)’).


Same as ‘--header-in --header-out’. A short option indicating the input file has a header line, and the output should contain a header line as well.


Ignore upper/lower case when comparing text for grouping, sorting, and comparing unique values in the ‘countunique’ and ‘unique’ operations.


Sort the input before grouping. datamash requires sorted input. If the input is not sorted, using --sort will automatically sort the input before processing it further. Sorting will be performed based on the specified --group parameter, and respecting case --ignore-case option (if used). The following commands are equivalent:

$ cat FILE | sort -k1,1 | datamash --group 1 sum 1
$ cat FILE | datamash --sort --group 1 sum 1

File Operation options:


Allow lines with varying number of fields. By default, transpose and reverse will fail with an error message unless all input lines have the same number of fields.


When use --no-strict option, missing fields will be filled with this value.

General options:

-t x

Use character X instead of TAB as field delimiter.


Skip NA or NaN values.


Use whitespace (one or more spaces and/or tabs) for field delimiters. TAB character will be used as output field separator.


End lines with a 0 byte, not newline.


Print an informative help message on standard output and exit successfully.


Print the version number and licensing information of Datamash on standard output and then exit successfully.

Next: , Previous: , Up: Top   [Contents][Index]