16.2 Other Stuff to Know

The rest of this chapter uses a number of terms. Here are some informal definitions that should help you work your way through the material here:

Accuracy

A floating-point calculation’s accuracy is how close it comes to the real (paper and pencil) value.

Error

The difference between what the result of a computation “should be” and what it actually is. It is best to minimize error as much as possible.

Exponent

The order of magnitude of a value; some number of bits in a floating-point value store the exponent.

Inf

A special value representing infinity. Operations involving another number and infinity produce infinity.

NaN

“Not a number.” A special value that results from attempting a calculation that has no answer as a real number. See Floating Point Values They Didn’t Talk About In School, for more information about infinity and not-a-number values.

Normalized

How the significand (see later in this list) is usually stored. The value is adjusted so that the first bit is one, and then that leading one is assumed instead of physically stored. This provides one extra bit of precision.

Precision

The number of bits used to represent a floating-point number. The more bits, the more digits you can represent. Binary and decimal precisions are related approximately, according to the formula:

prec = 3.322 * dps

Here, prec denotes the binary precision (measured in bits) and dps (short for decimal places) is the decimal digits.

Rounding mode

How numbers are rounded up or down when necessary. More details are provided later.

Significand

A floating-point value consists of the significand multiplied by 10 to the power of the exponent. For example, in 1.2345e67, the significand is 1.2345.

Stability

From the Wikipedia article on numerical stability: “Calculations that can be proven not to magnify approximation errors are called numerically stable.”

See the Wikipedia article on accuracy and precision for more information on some of those terms.

On modern systems, floating-point hardware uses the representation and operations defined by the IEEE 754 standard. Three of the standard IEEE 754 types are 32-bit single precision, 64-bit double precision, and 128-bit quadruple precision. The standard also specifies extended precision formats to allow greater precisions and larger exponent ranges. (awk uses only the 64-bit double-precision format.)

Table 16.3 lists the precision and exponent field values for the basic IEEE 754 binary formats.

NameTotal bitsPrecisionMinimum exponentMaximum exponent
Single3224−126+127
Double6453−1022+1023
Quadruple128113−16382+16383

Table 16.3: Basic IEEE format values

NOTE: The precision numbers include the implied leading one that gives them one extra bit of significand.