Next: Error Recovery, Previous: Significance Loss, Up: Floating Point in Depth [Contents][Index]

In 1990, when IBM introduced the POWER architecture, the CPU
provided a previously unknown instruction, the *fused
multiply-add* (FMA). It computes the value `x * y + z`

with
an **exact** double-length product, followed by an addition with a
*single* rounding. Numerical computation often needs pairs of
multiply and add operations, for which the FMA is well-suited.

On the POWER architecture, there are two dedicated registers that
hold permanent values of `0.0`

and `1.0`

, and the
normal *multiply* and *add* instructions are just
wrappers around the FMA that compute `x * y + 0.0`

and
`x * 1.0 + z`

, respectively.

In the early days, it appeared that the main benefit of the FMA was getting two floating-point operations for the price of one, almost doubling the performance of some algorithms. However, numerical analysts have since shown numerous uses of the FMA for significantly enhancing accuracy. We discuss one of the most important ones in the next section.

A few other architectures have since included the FMA, and most
provide variants for the related operations `x * y - z`

(FMS), `-x * y + z`

(FNMA), and `-x * y - z`

(FNMS).

The functions `fmaf`

, `fma`

, and `fmal`

implement fused
multiply-add for the `float`

, `double`

, and ```
long
double
```

data types. Correct implementation of the FMA in software is
difficult, and some systems that appear to provide those functions do
not satisfy the single-rounding requirement. That situation should
change as more programmers use the FMA operation, and more CPUs
provide FMA in hardware.

Use the `-ffp-contract=fast` option to allow generation of FMA
instructions, or `-ffp-contract=off` to disallow it.

Next: Error Recovery, Previous: Significance Loss, Up: Floating Point in Depth [Contents][Index]