Overview (GNU Datamash 1.9)

Next: Invoking datamash, Previous: Datamash, Up: Datamash [Contents][Index]

1 Overview ¶

The datamash program (https://www.gnu.org/software/datamash) performs calculation (e.g. sum,, count, min, max, skewness, standard deviation) on input files.

Example: sum up the values in the first column of the input:

$ seq 10 | datamash sum 1
55

datamash can group input data and perform operations on each group. It can sort the file, and read header lines.

Example: Given a file with three fields (name, subject, score), find the average score in each subject:

$ cat scores.txt
Name        Subject          Score
Bryan       Arts             68
Isaiah      Arts             80
Gabriel     Health-Medicine  100
Tysza       Business         92
Zackery     Engineering      54
...

$ datamash --sort --headers --group 2 mean 3 sstdev 3 < scores.txt
GroupBy(Subject)   mean(Score)   sstdev(Score)
Arts               68.9474       10.4215
Business           87.3636       5.18214
Engineering        66.5385       19.8814
Health-Medicine    90.6154       9.22441
Life-Sciences      55.3333       20.606
Social-Sciences    60.2667       17.2273

datamash is designed for interactive exploration of textual data and for automating tasks in shell scripts.

datamash has a rich set of statistical functions to quickly assess information in textual input files. An example of calculating basic statistic (mean, 1st quartile, median, 3rd quartile, IQR, sample-standard-deviation, and p-value of Jarque-Bera test for normal distribution:

$ datamash -H mean 1 q1 1 median 1 q3 1 iqr 1 sstdev 1 jarque 1 < FILE
mean(x)   q1(x)  median(x)  q3(x)   iqr(x)  sstdev(x)  jarque(x)
45.32     23     37         61.5    38.5    30.4487    8.0113-09