GNU datamash

GNU datamash is a command-line program which performs basic numeric, textual and statistical operations on input textual data files.


calculate the sum and mean of values 1 to 10:

  $ seq 10 | datamash sum 1 mean 1
  55 5.5

group text file by one column and calculate
mean and sample standard deviation on another,
with automatic sorting and header line processing:

  $ datamash --sort --headers groupby 2 mean 3 sstdev 3 < scores_h.txt
  GroupBy(Major)  mean(Score) sstdev(Score)
  Arts            68.94       10.42

file validation for pipeline automation and troubleshooting:

  $ datamash check < snp147Common.txt && echo ok || echo fail
  15189820 lines, 26 fields

  $ datamash check < tmp2.txt && echo ok || echo fail
  line 3816 (7 fields):
    chrY  9544432 9552871 NR_001534 0 - 0.5
  line 3817 (6 fields):
    chrY  9544432 9552871 NR_003592 0 -
  datamash: check failed: line 3817 has 6 fields (previous line had 7)

Downloading datamash

Datamash is runs on a wide variety of UNIX platforms, Windows, and MacOS.
See the download section for more details.

Documentation and Help

Source Code


Development of Datamash, and GNU in general, is a volunteer effort, and you can contribute. For information, please read How to help GNU. If you'd like to get involved, it's a good idea to join the discussion mailing list (see above).


Datamash is currently being maintained by Assaf Gordon and Tim Rice.
For any questions, please send email to


GNU Datamash is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.