Proposal: Add "fma" to the RealFloat class

Levent Erkok erkokl at
Sun May 3 21:11:26 UTC 2015

Thank you for all the feedback on this proposal. Based on the feedback, I
came to conclude that the original idea did not really capture what I
really was after, and hence I think this proposal needs to be shelved for
the time being.

I want to summarize the points made so far:

    * Almost everyone agrees that we should have this functionality
available. (But see below for the direction I want to take it in.)
    * There's some disagreement on the name chosen, but I think this is
less important for the time being.
    * The biggest gripe is where does "fma" really belong. Original
suggestion was 'RealFloat', but people pointed 'Num' is just a good place
as well.
    * Most folks want a default definition, and see "fma" as an

It is these last two points actually that convinced me this proposal is not
really what I want to have. I do not see "fma" as an optimization. In
particular, I'd be very concerned if the compiler substituted "fma x y z"
for "x*y+z". The entire reason why IEEE754 has an fma operation is because
those two expressions have different values in general. By the same token,
I'm also against providing a default implementation. I see this not as an
increased-precision issue, but rather a semantic one; where "x*y+z" and
"fma x y z" *should* produce two different values, per the IEEE754 spec.
It's not really an optimization, but how floating-point values work. In
that sense "fma" is a separate operation that's related to multiplication
and addition, but is not definable in those terms alone.

Having said that, it was also pointed out that for non-float values this
can act as an optimization. (Modular arithmetic was given as an example.)
I'd think that functionality is quite different than the original proposal,
and perhaps should be tackled separately. My original proposal was not
aiming for that particular use case.

My original motivation was to give Haskell access to the floating-point
circuitry that hardware-manufacturers are putting a lot of effort and
energy into. It's a shame that modern processors provide a ton of
instructions around floating-point operations, but such operations are
simply very hard to use from many high-level languages, including Haskell.

Two other points were raised, that also convinced me to seek an alternative

   * Tikhon Jelvis suggested these functions should be put in a different
class, which suggests that we're following IEEE754, and not some idealized
model of numbers. I think this suggestion is spot on, and is very much in
line with what I wanted to have.
   * Takebonu Tani kindly pointed that a discussion of floats in the
absence of rounding-modes is a moot one, as the entire semantics is based
on rounding. Haskell simply picks "RoundNearestTiesToEven," but there are 4
other rounding modes defined by IEEE754, and I think we need a way to
access those from Haskell in a convenient way.

Based on this analysis, I'm withdrawing the original proposal. I think fma
and other floating-point arithmetic operations are very important to
support properly, but it should not be done by tacking them on to Num or
RealFloat; but rather in a new class that also considers rounding-mode

The advantage of the "separate" class approach is, of course, I (or someone
else) can create such a class and push it on to hackage, using FFI to
delegate the task of implementation to the land-of-C, by supporting
rounding modes and other floating-point weirdness appropriately. Once that
class stabilizes and its details are ironed out, then we can imagine
cooperating with GHC folks to actually bypass the FFI and directly generate
native code whenever possible.

This is the direction I intend to move on. Please drop me a line if you'd
like to help out and/or have any feedback.



On Sun, May 3, 2015 at 7:27 AM, David Feuer <david.feuer at> wrote:

> We have (almost) no tradition of using CPU instruction names for our own
> function, and I don't see why now is the time to start. To take a recent
> example, we have countLeadingZeros and countTrailingZeros rather than clz,
> ctz, ctlz, cttz, bsf, bsr, etc. We also have popCount instead of popcnt,
> and use shiftR and shiftL instead of things like shl, shr, sla, sal, sra,
> sar, etc. Thus I am -1 on calling this thing fma. multiplyAdd seems more
> reasonable to me.
> On Sun, May 3, 2015 at 3:42 AM, Takenobu Tani <takenobu.hs at>
> wrote:
>> Hi,
>> little information.
>> General CPUs use term of "FMA" for "Mul + Add" operation
>> and implement special instructions.
>> x86(AMD64, Intel64) has FMA instructions:
>>   FMADD132PD, ...
>> ARM has FMA instructions:
>>   VMLA, ...
>> In DSP culture, it's called "MAC(Multiply and Accumulator)".
>> Traditional DSPs have MAC(Multiply and Accumulator) instructions:
>> TI's C67 has MAC instructions:
>>   MAC, ...
>> If you map "fma" function to cpu's raw instruction,
>> be careful for rounding and saturation mode.
>> BTW, "FMA" operation is defined in IEEE754-2008 standard.
>> Regards,
>> Takenobu
>> 2015-04-29 18:19 GMT+09:00 Henning Thielemann <
>> lemming at>:
>>> On Wed, 29 Apr 2015, Levent Erkok wrote:
>>>  This proposal is very much in the spirit of the earlier proposal on
>>>> adding new float/double functions; for
>>>> instance see here:
>>> Btw. what was the final decision with respect to log1p and expm1?
>>> I suggest that the decision for 'fma' will be made consistently with
>>> 'log1p' and 'expm1'.
>>>  "fma" (a.k.a. fused-multiply-add) is one of those functions; which is
>>>> the workhorse in many HPC applications.
>>>> The idea is to multiply two floats and add a third with just one
>>>> rounding, and thus preserving more precision.
>>>> There are a multitude of applications for this operation in engineering
>>>> data-analysis, and modern processors
>>>> come with custom implementations and a lot of hardware to support it
>>>> natively.
>>> Ok, the proposal is about increasing precision. One could also hope that
>>> a single fma operation is faster than separate addition and multiplication
>>> but as far as I know, fma can even be slower since it has more data
>>> dependencies.
>>>  I think the proposal is rather straightforward, and should be
>>>> noncontroversial. To wit, we shall add a new
>>>> method to the RealFloat class:
>>>>   class (RealFrac a, Floating a) => RealFloat a where
>>>>       ...
>>>>       fma :: a -> a -> a -> a
>>> RealFloat excludes Complex.
>>>  There should be no default definitions; as an incorrect (two-rounding
>>>> version) would essentially beat the purpose of having fma in the first
>>>> place.
>>> I just read again the whole expm1 thread and default implementations
>>> with possible loss of precision seem to be the best option. This way, one
>>> can mechanically replace all occurrences of (x*y+z) by (fma x y z) and will
>>> not make anything worse. Types with a guaranteed high precision should be
>>> put in a Fused class.
>>>  While the name "fma" is well-established in the arithmetic/hardware
>>>> community and in the C-library, we can also go with "fusedMultiplyAdd," if
>>>> that is deemed more clear.
>>> Although I like descriptive names, the numeric classes already contain
>>> mostly abbreviations (abs, exp, sin, tanh, ...) Thus I would prefer the
>>> abbreviation for consistency. Btw. in DSP 56002 the same operation is
>>> called MAC (multiply-accumulate).
>>> _______________________________________________
>>> Libraries mailing list
>>> Libraries at
>> _______________________________________________
>> Libraries mailing list
>> Libraries at
> _______________________________________________
> Libraries mailing list
> Libraries at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Libraries mailing list