Next: , Previous: , Up: Usage Examples   [Contents][Index]


5.7 Check - checking tabular structure

datamash check validates the tabular structure of a file, ensuring all lines have the same number of fields. check is meant to be used in scripting and automation pipelines, as it will terminate with non-zero exit code if the file is not well structured, while also printing detailed context information about the offending lines:

$ cat good.txt
A    1    ww
B    2    xx
C    3    yy
D    4    zz


$ cat bad.txt
A    1    ww
B    2    xx
C    3
D    4    zz


$ datamash check < good.txt && echo ok || echo fail
4 lines, 3 fields
ok


$ datamash check < bad.txt && echo ok || echo fail
line 2 (3 fields):
  B  2 xx
line 3 (2 fields):
  C  3
datamash: check failed: line 3 has 2 fields (previous line had 3)
fail

5.7.1 Expected number of lines/fields

check accepts optional lines and fields and will return failure if the input does not have the requested number of lines/fields.

The syntax is:

datamash check [N lines] [N fields]

Usage examples:

$ cat file.txt
A    1    ww
B    2    xx
C    3    yy
D    4    zz

$ datamash check 4 lines < file.txt && echo ok
4 lines, 3 fields
ok

$ datamash check 3 fields < file.txt && echo ok
4 lines, 3 fields
ok

$ datamash check 4 lines 3 fields < file.txt && echo ok
4 lines, 3 fields
ok

$ datamash check 7 fields < file.txt && echo ok
line 1 (3 fields):
  A    1    ww
datamash: check failed: line 1 has 3 fields (expecting 22)

$ datamash check 10 lines < file.txt && echo ok
datamash: check failed: input had 4 lines (expecting 10)

For convenience, line,row,rows can be used instead of lines; field,columns,column,col can be used instead of fields. The following are all equivalent:

datamash check 4 lines 10 fields < file.txt
datamash check 4 rows  10 columns < file.txt
datamash check 10 col 4 row < file.txt

5.7.2 checks in automation scripts

In pipeline/automation context, it is often beneficial to validate files as early as possible (immediately after file is created, as in fail-fast methodology). A typical usage in a shell script would be:

#!/bin/sh

die()
{
    base=$(basename "$0")
    echo "$base: error: $@" >&2
    exit 1
}

custom pipeline-or-program > output.txt \
    || die "program failed"

datamash check < output.txt \
    || die "'output.txt' has invalid structure (missing fields)"

If the generated output.txt file has invalid structure (i.e. missing fields), datamash will print the stderr enough details to help in troubleshooting (line numbers and offending line’s content).


Next: Crosstab - Cross-Tabulation (pivot-tables), Previous: Groupby on /etc/passwd, Up: Usage Examples   [Contents][Index]