Next: , Previous: , Up: Datamash   [Contents][Index]


3 Available operations in datamash

Primary operations:
groupby

alternative syntax for --group

crosstab

cross-tabulate two fields (also known as ’pivot-tables’)

transpose

transpose rows, columns of a text file

reverse

reverse fields in each line of a text file

check

verify tabular structure of input (ensure same number of fields in all lines)

Line-Filtering operation:
rmdup

remove lines with duplicated key value

Per-Line operations:
base64

encode the field as base64

debase64

decode the field as base64. Exit with an error if the field is invalid base64 value which cannot be decoded.

md5

calculates md5 hash of the field

sha1

calculates sha1 hash of the field

sha224

calculates sha224 hash of the field

sha256

calculates sha256 hash of the field

sha384

calculates sha384 hash of the field

sha512

calculates sha512 hash of the field

dirname

extracts the directory name of the field (assuming the field is a file name). Similar to dirname(1).

basename

extracts the base file name of the field (assuming the field is a file name). Similar to basename(1).

extname

extracts the extension of the file name of the field (assuming the field is a file name).

extname

extracts the base file name of the field without the extension (assuming the field is a file name).

getnum

extract a number from the field. getnum accepts an optional single letter option ‘n/i/d/p/h/o’ affecting the detected value.

cut

copy input field to output field (similar to cut(1)). When the cut operation is given a list of fields, the fields are copied in the given order (in contrast to cut(1)).

echo

an alias for cut.

Group-by Numeric operations:
sum

sum the of values

min

minimum value

max

maximum value

absmin

minimum of the absolute values

absmax

maximum of the absolute values

range

range of values (maximum - minimum)

Group-By Textual/Numeric operations:
count

count number of elements in the group

first

the first value of the group

last

the last value of the group

rand

one random value from the group

unique

comma-separated sorted list of unique values

uniq

an alias for unique.

--collapse-delimiter can be used to use a different character than comma.

collapse

comma-separated list of all input values

--collapse-delimiter can be used to use a different character than comma.

countunique

number of unique/distinct values

Group-By Statistical operations:
mean

mean of the values

geomean

geometric mean of the values

harmmean

harmonic mean of the values

trimmean

trimmed mean of the values

ms

mean square of the values

rms

root mean square of the values

median

median value

q1

1st quartile value

q3

3rd quartile value

iqr

inter-quartile range

perc

percentile value

mode

mode value (most common value)

antimode

anti-mode value (least common value)

pstdev

population standard deviation

sstdev

sample standard deviation

pvar

population variance

svar

sample variance

mad

Median Absolute Deviation, scaled by a constant 1.4826 for normal distributions

madraw

Median Absolute Deviation, unscaled

sskew

skewness of the (sample) group

pskew

skewness of the (population) group

skurt

Excess Kurtosis of the (sample) group

pkurt

Excess Kurtosis of the (population) group

jarque

p-value of the Jarque-Beta test for normality

dpo

p-value of the D’Agostino-Pearson Omnibus test for normality.


Next: Statistical Operations, Previous: Invoking datamash, Up: Datamash   [Contents][Index]