Next: Floating-Point Number Classification Functions, Previous: Integer Division, Up: Arithmetic Functions [Contents][Index]

Most computer hardware has support for two different kinds of numbers:
integers (*…-3, -2, -1, 0, 1, 2, 3…*) and
floating-point numbers. Floating-point numbers have three parts: the
*mantissa*, the *exponent*, and the *sign bit*. The real
number represented by a floating-point value is given by
*(s ? -1 : 1) · 2^e · M*
where *s* is the sign bit, *e* the exponent, and *M*
the mantissa. See Floating Point Representation Concepts, for details. (It is
possible to have a different *base* for the exponent, but all modern
hardware uses *2*.)

Floating-point numbers can represent a finite subset of the real
numbers. While this subset is large enough for most purposes, it is
important to remember that the only reals that can be represented
exactly are rational numbers that have a terminating binary expansion
shorter than the width of the mantissa. Even simple fractions such as
*1/5* can only be approximated by floating point.

Mathematical operations and functions frequently need to produce values
that are not representable. Often these values can be approximated
closely enough for practical purposes, but sometimes they can’t.
Historically there was no way to tell when the results of a calculation
were inaccurate. Modern computers implement the IEEE 754 standard
for numerical computations, which defines a framework for indicating to
the program when the results of calculation are not trustworthy. This
framework consists of a set of *exceptions* that indicate why a
result could not be represented, and the special values *infinity*
and *not a number* (NaN).