Previous: , Up: Usage Examples   [Contents][Index]


5.11 Binning strings

Hash any string input value into a numeric integer. A typical usage would be to split an input file into N chunks, ensuring that all values of a certain key will be stored in the same chunk:

$ cat input.txt
PatientA   10
PatientB   11
PatientC   12
PatientA   14
PatientC   15

Each patient ID is hashed into a bin between 0 and 9 and printed in the last field:

$ datamash --full strbin 1 < input.txt
PatientA   10    5
PatientB   11    6
PatientC   12    7
PatientA   14    5
PatientC   15    7

Splitting the input into chunks can be done with awk:

$ cat input.txt | datamash --full strbin 1 \
    | awk '{print > $NF ".txt"}'