GNU datamash

GNU datamash is a command-line program which performs basic numeric, textual and statistical operations on input textual data files.

Examples:

calculate the sum and mean of values 1 to 10:

  $ seq 10 | datamash sum 1 mean 1
  55 5.5

group text file by one column and calculate
mean and sample standard deviation on another,
with automatic sorting and header line processing:

  $ datamash --sort --headers groupby 2 mean 3 sstdev 3 < scores_h.txt
  GroupBy(Major)  mean(Score) sstdev(Score)
  Arts            68.94       10.42
  ...

file validation for pipeline automation and troubleshooting:

  $ datamash check < snp147Common.txt && echo ok || echo fail
  15189820 lines, 26 fields
  ok

  $ datamash check < tmp2.txt && echo ok || echo fail
  line 3816 (7 fields):
    chrY  9544432 9552871 NR_001534 0 - 0.5
  line 3817 (6 fields):
    chrY  9544432 9552871 NR_003592 0 -
  datamash: check failed: line 3817 has 6 fields (previous line had 7)
  fail

Downloading datamash

Datamash is runs on a wide variety of UNIX platforms, Windows, and MacOS.
See the download section for more details.

Documentation and Help

Source Code

Development

Development of Datamash, and GNU in general, is a volunteer effort, and you can contribute. For information, please read How to help GNU. If you'd like to get involved, it's a good idea to join the discussion mailing list (see above).

Maintainer

Datamash is currently being maintained by Assaf Gordon and Tim Rice.
For any questions, please send email to bug-datamash@gnu.org.

Licensing

GNU Datamash is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.