Next: , Up: Typing and Comparison


6.3.2.1 String Type Versus Numeric Type

The 1992 POSIX standard introduced the concept of a numeric string, which is simply a string that looks like a number—for example, " +2". This concept is used for determining the type of a variable. The type of the variable is important because the types of two variables determine how they are compared. The various versions of the POSIX standard did not get the rules quite right for several editions. Fortunately, as of at least the 2008 standard (and possibly earlier), the standard has been fixed, and variable typing follows these rules:1

The last rule is particularly important. In the following program, a has numeric type, even though it is later used in a string operation:

     BEGIN {
          a = 12.345
          b = a " is a cute number"
          print b
     }

When two operands are compared, either string comparison or numeric comparison may be used. This depends upon the attributes of the operands, according to the following symmetric matrix:

             +——————————————————————–
             |       STRING          NUMERIC         STRNUM
     ———–+——————————————————————–
             |
     STRING  |       string          string          string
             |
     NUMERIC |       string          numeric         numeric
             |
     STRNUM  |       string          numeric         numeric
     ———–+——————————————————————–

The basic idea is that user input that looks numeric—and only user input—should be treated as numeric, even though it is actually made of characters and is therefore also a string. Thus, for example, the string constant " +3.14", when it appears in program source code, is a string—even though it looks numeric—and is never treated as number for comparison purposes.

In short, when one operand is a “pure” string, such as a string constant, then a string comparison is performed. Otherwise, a numeric comparison is performed.

This point bears additional emphasis: All user input is made of characters, and so is first and foremost of string type; input strings that look numeric are additionally given the strnum attribute. Thus, the six-character input string ‘ +3.14 receives the strnum attribute. In contrast, the eight-character literal " +3.14" appearing in program text is a string constant. The following examples print ‘1’ when the comparison between the two different constants is true, ‘0’ otherwise:

     $ echo ' +3.14' | gawk '{ print $0 == " +3.14" }'    True
     -| 1
     $ echo ' +3.14' | gawk '{ print $0 == "+3.14" }'     False
     -| 0
     $ echo ' +3.14' | gawk '{ print $0 == "3.14" }'      False
     -| 0
     $ echo ' +3.14' | gawk '{ print $0 == 3.14 }'        True
     -| 1
     $ echo ' +3.14' | gawk '{ print $1 == " +3.14" }'    False
     -| 0
     $ echo ' +3.14' | gawk '{ print $1 == "+3.14" }'     True
     -| 1
     $ echo ' +3.14' | gawk '{ print $1 == "3.14" }'      False
     -| 0
     $ echo ' +3.14' | gawk '{ print $1 == 3.14 }'        True
     -| 1

Footnotes

[1] gawk has followed these rules for many years, and it is gratifying that the POSIX standard is also now correct.