Field Splitting Summary (The GNU Awk User’s Guide)

Previous: Making the Full Line Be a Single Field, Up: Specifying How Fields Are Separated [Contents][Index]

4.5.7 Field-Splitting Summary ¶

It is important to remember that when you assign a string constant as the value of FS, it undergoes normal awk string processing. For example, with Unix awk and gawk, the assignment ‘FS = "\.."’ assigns the character string ".." to FS (the backslash is stripped). This creates a regexp meaning “fields are separated by occurrences of any two characters.” If instead you want fields to be separated by a literal period followed by any single character, use ‘FS = "\\.."’.

The following list summarizes how fields are split, based on the value of FS (‘==’ means “is equal to”):

gawk was invoked with --csv: Field splitting follows the rules given in Working With Comma Separated Value Files. The value of FS is ignored.
FS == " ": Fields are separated by runs of whitespace. Leading and trailing whitespace are ignored. This is the default.
FS == any other single character: Fields are separated by each occurrence of the character. Multiple successive occurrences delimit empty fields, as do leading and trailing occurrences. The character can even be a regexp metacharacter; it does not need to be escaped.
FS == regexp: Fields are separated by occurrences of characters that match regexp. Leading and trailing matches of regexp delimit empty fields.
FS == "": Each individual character in the record becomes a separate field. (This is a common extension; it is not specified by the POSIX standard.)

`FS` and `IGNORECASE`
The `IGNORECASE` variable (see Built-in Variables That Control `awk`) affects field splitting only when the value of `FS` is a regexp. It has no effect when `FS` is a single character, even if that character is a letter. Thus, in the following code: FS = "c" IGNORECASE = 1 $0 = "aCa" print $1 The output is ‘`aCa`’. If you really want to split fields on an alphabetic character while ignoring case, use a regexp that will do it for you (e.g., ‘`FS = "[c]"`’). In this case, `IGNORECASE` will take effect.

FS and IGNORECASE

The IGNORECASE variable (see Built-in Variables That Control awk) affects field splitting only when the value of FS is a regexp. It has no effect when FS is a single character, even if that character is a letter. Thus, in the following code:

FS = "c"
IGNORECASE = 1
$0 = "aCa"
print $1

The output is ‘aCa’. If you really want to split fields on an alphabetic character while ignoring case, use a regexp that will do it for you (e.g., ‘FS = "[c]"’). In this case, IGNORECASE will take effect.