Next: , Previous: , Up: Top   [Contents][Index]


2 Invoking datamash

The format for running the datamash program is:

datamash [option]… op1 column1  [op2 column2 …]

Where op1 is the operation to perform on the values in column1. datamash reads input from stdin and performs one or more operations on the input data. If --group is used, each operation is performed on every group. If --group is not used, each operation is performed on all the values in the input file.

datamash supports the following operations:

Primary operations:

groupby, crosstab, transpose, reverse, check

Line-Filtering operations:

rmdup

Per-Line operations:

base64, debase64, md5, sha1, sha256, sha512, bin, strbin, round, floor, ceil, trunc, frac

Group-by Numeric operations:

sum, min, max, absmin, absmax, range

Group-by Textual/Numeric operations:

count, first, last, rand, unique, collapse, countunique

Group-by Statistical operations:

mean, mode, median, q1, q3, iqr, perc, antimode, pstdev, sstdev, pvar, svar, mad, madraw, sskew, pskew, skurt, pkurt, jarque, dpo, scov, pcov, spearson, ppearson

Grouping options:

--full
-f

Print entire input line before op results (default: print only the grouped keys).

--group=X[,Y,X]
-g X[,Y,X]

Group input via fields X[,Y,Z]. By default, fields are separated by TABs. Use --field-separator to change the delimiter character. Input file must be sorted by the same fields X[,Y,Z]. Use --sort to automatically sort the input. If --group is not specified, each operation is performed in the entire input file.

--header-in

Indicates the first input line is column headers, and should not be used for any calculations.

--header-out

Print column headers as first line. If the column header names are known (i.e. the input file had a header line, and the command was invoked with --header-in, -H or --headers), prints the operation and the name of the field (e.g. ‘mean(X)’). Otherwise, prints the number operation and the field number (e.g. ‘mean(field-3)’).

--headers
-H

Same as ‘--header-in --header-out’. A short option indicating the input file has a header line, and the output should contain a header line as well.

--ignore-case
-i

Ignore upper/lower case when comparing text for grouping, sorting, and comparing unique values in the ‘countunique’ and ‘unique’ operations.

--sort
-s

Sort the input before grouping. datamash requires sorted input. If the input is not sorted, using --sort will automatically sort the input before processing it further. Sorting will be performed based on the specified --group parameter, and respecting case --ignore-case option (if used). The following commands are equivalent:

$ cat FILE | sort -k1,1 | datamash --group 1 sum 1
$ cat FILE | datamash --sort --group 1 sum 1

File Operation options:

--no-strict

Allow lines with varying number of fields. By default, transpose and reverse will fail with an error message unless all input lines have the same number of fields.

--filler=x

When use --no-strict option, missing fields will be filled with this value.

General options:

--field-separator=x
-t x

Use character X instead of TAB as field delimiter.

--narm

Skip NA or NaN values.

--whitespace
-W

Use whitespace (one or more spaces and/or tabs) for field delimiters. TAB character will be used as output field separator.

--zero-terminated
-z

End lines with a 0 byte, not newline.

--help

Print an informative help message on standard output and exit successfully.

--version

Print the version number and licensing information of Datamash on standard output and then exit successfully.


Next: , Previous: , Up: Top   [Contents][Index]