Previous: Unexpected Results, Up: Floating Point Issues


D.3.3 Standards Versus Existing Practice

Historically, awk has converted any non-numeric looking string to the numeric value zero, when required. Furthermore, the original definition of the language and the original POSIX standards specified that awk only understands decimal numbers (base 10), and not octal (base 8) or hexadecimal numbers (base 16).

As of this writing (February, 2007), changes in the language of the current POSIX standard can be interpreted to imply that awk should support additional features. These features are:

The first problem is that both of these are clear changes to historical practice:

The second problem is that the gawk maintainer feels that this interpretation of the standard, which requires a certain amount of “language lawyering” to arrive at in the first place, was not intended by the standard developers, either. In other words, “we see how you got where you are, but we don't think that that's where you want to be.”

Nevertheless, on systems that support IEEE floating point, it seems reasonable to provide some way to support NaN and Infinity values. The solution implemented in gawk, as of version 3.1.6, is as follows:

  1. With the --posix command-line option, gawk becomes “hands off.” String values are passed directly to the system library's strtod() function, and if it successfuly returns a numeric value, that is what's used. By definition, the results are not portable across different systems.1 They are also a little surprising:
              $ echo nanny | gawk --posix '{ print $1 + 0 }'
              -| nan
              $ echo 0xDeadBeef | gawk --posix '{ print $1 + 0 }'
              -| 3735928559
    
  2. Without --posix, gawk interprets the four strings ‘+inf’, ‘-inf’, ‘+nan’, and ‘-nan’ specially, producing the corresponding special numeric values. The leading sign acts a signal to gawk (and the user) that the value is really numeric. Hexadecimal floating point is not supported (unless you also use --non-decimal-data, which is not recommended). For example:
              $ echo nanny | gawk '{ print $1 + 0 }'
              -| 0
              $ echo +nan | gawk '{ print $1 + 0 }'
              -| nan
              $ echo 0xDeadBeef | gawk '{ print $1 + 0 }'
              -| 0
    

    gawk does ignore case distinction in the four special values. Thus ‘+nan’ and ‘+NaN’ are the same.


Footnotes

[1] You asked for it, you got it.