Proposal: Add "fma" to the RealFloat class

Takenobu Tani takenobu.hs at
Sun May 3 07:42:12 UTC 2015


little information.

General CPUs use term of "FMA" for "Mul + Add" operation
and implement special instructions.

x86(AMD64, Intel64) has FMA instructions:
  FMADD132PD, ...

ARM has FMA instructions:
  VMLA, ...

In DSP culture, it's called "MAC(Multiply and Accumulator)".
Traditional DSPs have MAC(Multiply and Accumulator) instructions:

TI's C67 has MAC instructions:
  MAC, ...

If you map "fma" function to cpu's raw instruction,
be careful for rounding and saturation mode.

BTW, "FMA" operation is defined in IEEE754-2008 standard.


2015-04-29 18:19 GMT+09:00 Henning Thielemann <lemming at

> On Wed, 29 Apr 2015, Levent Erkok wrote:
>  This proposal is very much in the spirit of the earlier proposal on
>> adding new float/double functions; for
>> instance see here:
> Btw. what was the final decision with respect to log1p and expm1?
> I suggest that the decision for 'fma' will be made consistently with
> 'log1p' and 'expm1'.
>  "fma" (a.k.a. fused-multiply-add) is one of those functions; which is the
>> workhorse in many HPC applications.
>> The idea is to multiply two floats and add a third with just one
>> rounding, and thus preserving more precision.
>> There are a multitude of applications for this operation in engineering
>> data-analysis, and modern processors
>> come with custom implementations and a lot of hardware to support it
>> natively.
> Ok, the proposal is about increasing precision. One could also hope that a
> single fma operation is faster than separate addition and multiplication
> but as far as I know, fma can even be slower since it has more data
> dependencies.
>  I think the proposal is rather straightforward, and should be
>> noncontroversial. To wit, we shall add a new
>> method to the RealFloat class:
>>   class (RealFrac a, Floating a) => RealFloat a where
>>       ...
>>       fma :: a -> a -> a -> a
> RealFloat excludes Complex.
>  There should be no default definitions; as an incorrect (two-rounding
>> version) would essentially beat the purpose of having fma in the first
>> place.
> I just read again the whole expm1 thread and default implementations with
> possible loss of precision seem to be the best option. This way, one can
> mechanically replace all occurrences of (x*y+z) by (fma x y z) and will not
> make anything worse. Types with a guaranteed high precision should be put
> in a Fused class.
>  While the name "fma" is well-established in the arithmetic/hardware
>> community and in the C-library, we can also go with "fusedMultiplyAdd," if
>> that is deemed more clear.
> Although I like descriptive names, the numeric classes already contain
> mostly abbreviations (abs, exp, sin, tanh, ...) Thus I would prefer the
> abbreviation for consistency. Btw. in DSP 56002 the same operation is
> called MAC (multiply-accumulate).
> _______________________________________________
> Libraries mailing list
> Libraries at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Libraries mailing list