[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3. How combine Processes Files

The base of combine reads records from a data file (or a series of them in a row) and if there is an output request for data records, it writes the requested fields out to a file or to stdout. Here is an example of this most simple version of events.

combine --write-output --output-fields=1-

This is essentially an expensive pipe. It reads from stdin and writes the entire record back to stdout.

Introducing a reference file gives more options. Now combine reads the reference file into memory before reading the data file. For every data record, combine then checks to see if it has a match. The following example limits the simple pipe above by restricting the output to those records from stdin that share the first 10 bytes in common with a record in the reference file.

combine -w -o 1- -r reference_file.txt --key-fields=1-10 \
            --data-key-fields=1-10 --unique

Note that the option ‘--unique’ is used here to prevent more than one copy of a key from being stored by combine. Without it, duplicate keys in the reference file, when matched, would result in more than one copy of the matching data record.

The other option with a reference file is to have output based on the records in that file, with indicators of how the data file records were able to match to them. In the next example, the same match as above is done, but this time we write out a record for every unique key, with a flag set to ‘1’ if it was matched by a data record or ‘0’ otherwise. It still reads the data records from stdin and writes the output records to stdout.

combine -r -f reference_file.txt -k 1-10 -m 1-10 -u -w -o 1-10

Of course, you might want both sets of output at the same time: the list of data records that matched the keys in the reference file and a list of keys in the reference file with an indication of which ones were matched. In the prior two examples the two different kinds of output were written to stdout. You can still do that if you like, and then do a little post-processing to determine where the data-based records leave off and the reference-based records begins. A simpler way, however, is to let combine write the information to separate files.

In the following example we combine the output specifications from the prior two examples and give them each a filename. Note that the first one has a spelled-out ‘--output-file’ while the second one uses the shorter 1-letter option ‘-t’.

combine -w -o 1- --output-file testdata.txt \
            -r -f reference_file.txt -k 1-10 -m 1-10 \
            -u -w -o 1-10 -t testflag.txt

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Daniel P. Valentine on July 28, 2013 using texi2html 1.82.