combine Manual 0.4.0: 1.1.1.2 Data Validation

1.1.1.2 Data Validation

Another example that is often important when receiving data from the outside world is data validation.

Perhaps you receive product sales data from a number of retailers, which you need to combine with product information you have elsewhere. To confirm that the file you received is valid for reporting, you might need to check the product codes, ZIP codes, and retailer codes against your known valid values.

All the comparisons can be done in a single command, which will result in a status which will flag us down if anything did not match to our lists of expected values.

combine -w -o 1- \
            -r products.txt -k 1-18 -m 11-28 \
            -r zip.txt -k 1-5 -m 19-23 \
            -r customer.txt -k 1-10 -m 1-10 \
            input.txt \
  | cmp input.txt
result=$?

That’s probably enough if we are pretty sure that the incoming data is usually clean. If something does not match up, we can investigate by hand. On the other hand, if we expect to find differences a little more frequently, we can make some small changes.

The following command makes the match optional, but puts a constant ’$’ at the end of the record for each match among the three keys to be validated. When there isn’t a match, combine puts a space into the record in place of the ’$’. We can then search for something other than ’$$$’ at the end of the record to know which records didn’t match.

combine -w -o 1- \
            -r products.txt -k 1-18 -m 11-28 -k '$' -p \
            -r zip.txt -k 1-5 -m 19-23 -k '$' -p \
            -r customer.txt -k 1-10 -m 1-10 -k '$' -p \
            input.txt \
  | grep -v '\$\$\$$' > input.nomatch

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Daniel P. Valentine on July 28, 2013 using texi2html 1.82.