Proposal: Add "fma" to the RealFloat class

Sun May 3 22:03:26 UTC 2015

Thanks for taking time to write this, Levent.

Now that you explain this in such detail, it's clear why implementing
fma in terms of add and multiply is wrong.

I also have to admit that upon the first reading of your proposal, I
confused RealFloat with RealFrac. Since RealFloat should only be
implemented by actual floating-point types, I retract my earlier objection.

And the idea of putting the IEEE754-specific functions in a separate
class (or even module) sounds reasonable, too.

On 04/05/15 00:11, Levent Erkok wrote:
> Thank you for all the feedback on this proposal. Based on the feedback,
> I came to conclude that the original idea did not really capture what I
> really was after, and hence I think this proposal needs to be shelved
> for the time being.
> 
> I want to summarize the points made so far:
> 
>     * Almost everyone agrees that we should have this functionality
> available. (But see below for the direction I want to take it in.)
>     * There's some disagreement on the name chosen, but I think this is
> less important for the time being.
>     * The biggest gripe is where does "fma" really belong. Original
> suggestion was 'RealFloat', but people pointed 'Num' is just a good
> place as well.
>     * Most folks want a default definition, and see "fma" as an
> optimization.
> 
> It is these last two points actually that convinced me this proposal is
> not really what I want to have. I do not see "fma" as an optimization.
> In particular, I'd be very concerned if the compiler substituted "fma x
> y z" for "x*y+z". The entire reason why IEEE754 has an fma operation is
> because those two expressions have different values in general. By the
> same token, I'm also against providing a default implementation. I see
> this not as an increased-precision issue, but rather a semantic one;
> where "x*y+z" and "fma x y z" *should* produce two different values, per
> the IEEE754 spec. It's not really an optimization, but how
> floating-point values work. In that sense "fma" is a separate operation
> that's related to multiplication and addition, but is not definable in
> those terms alone.
> 
> Having said that, it was also pointed out that for non-float values this
> can act as an optimization. (Modular arithmetic was given as an
> example.) I'd think that functionality is quite different than the
> original proposal, and perhaps should be tackled separately. My original
> proposal was not aiming for that particular use case.
> 
> My original motivation was to give Haskell access to the floating-point
> circuitry that hardware-manufacturers are putting a lot of effort and
> energy into. It's a shame that modern processors provide a ton of
> instructions around floating-point operations, but such operations are
> simply very hard to use from many high-level languages, including Haskell.
> 
> Two other points were raised, that also convinced me to seek an
> alternative solution:
> 
>    * Tikhon Jelvis suggested these functions should be put in a
> different class, which suggests that we're following IEEE754, and not
> some idealized model of numbers. I think this suggestion is spot on, and
> is very much in line with what I wanted to have.
>    * Takebonu Tani kindly pointed that a discussion of floats in the
> absence of rounding-modes is a moot one, as the entire semantics is
> based on rounding. Haskell simply picks "RoundNearestTiesToEven," but
> there are 4 other rounding modes defined by IEEE754, and I think we need
> a way to access those from Haskell in a convenient way.
> 
> Based on this analysis, I'm withdrawing the original proposal. I think
> fma and other floating-point arithmetic operations are very important to
> support properly, but it should not be done by tacking them on to Num or
> RealFloat; but rather in a new class that also considers rounding-mode
> properly.
> 
> The advantage of the "separate" class approach is, of course, I (or
> someone else) can create such a class and push it on to hackage, using
> FFI to delegate the task of implementation to the land-of-C, by
> supporting rounding modes and other floating-point weirdness
> appropriately. Once that class stabilizes and its details are ironed
> out, then we can imagine cooperating with GHC folks to actually bypass
> the FFI and directly generate native code whenever possible.
> 
> This is the direction I intend to move on. Please drop me a line if
> you'd like to help out and/or have any feedback.
> 
> Thanks!
> 
> -Levent.
> 
> 
> On Sun, May 3, 2015 at 7:27 AM, David Feuer <david.feuer at gmail.com
> <mailto:david.feuer at gmail.com>> wrote:
> 
>     We have (almost) no tradition of using CPU instruction names for our
>     own function, and I don't see why now is the time to start. To take
>     a recent example, we have countLeadingZeros and countTrailingZeros
>     rather than clz, ctz, ctlz, cttz, bsf, bsr, etc. We also have
>     popCount instead of popcnt, and use shiftR and shiftL instead of
>     things like shl, shr, sla, sal, sra, sar, etc. Thus I am -1 on
>     calling this thing fma. multiplyAdd seems more reasonable to me.
> 
>     On Sun, May 3, 2015 at 3:42 AM, Takenobu Tani <takenobu.hs at gmail.com
>     <mailto:takenobu.hs at gmail.com>> wrote:
> 
>         Hi,
> 
>         little information.
> 
>         General CPUs use term of "FMA" for "Mul + Add" operation
>         and implement special instructions.
> 
>         x86(AMD64, Intel64) has FMA instructions:
>           FMADD132PD, ...
> 
>         ARM has FMA instructions:
>           VMLA, ...
> 
> 
>         In DSP culture, it's called "MAC(Multiply and Accumulator)".
>         Traditional DSPs have MAC(Multiply and Accumulator) instructions:
> 
>         TI's C67 has MAC instructions:
>           MAC, ...
> 
> 
>         If you map "fma" function to cpu's raw instruction,
>         be careful for rounding and saturation mode.
> 
> 
>         BTW, "FMA" operation is defined in IEEE754-2008 standard.
> 
> 
>         Regards,
>         Takenobu
> 
>         2015-04-29 18:19 GMT+09:00 Henning Thielemann
>         <lemming at henning-thielemann.de
>         <mailto:lemming at henning-thielemann.de>>:
> 
> 
>             On Wed, 29 Apr 2015, Levent Erkok wrote:
> 
>                 This proposal is very much in the spirit of the earlier
>                 proposal on adding new float/double functions; for
>                 instance see
>                 here: https://mail.haskell.org/pipermail/libraries/2014-April/022667.html
> 
> 
>             Btw. what was the final decision with respect to log1p and
>             expm1?
> 
>             I suggest that the decision for 'fma' will be made
>             consistently with 'log1p' and 'expm1'.
> 
>                 "fma" (a.k.a. fused-multiply-add) is one of those
>                 functions; which is the workhorse in many HPC applications.
>                 The idea is to multiply two floats and add a third with
>                 just one rounding, and thus preserving more precision.
>                 There are a multitude of applications for this operation
>                 in engineering data-analysis, and modern processors
>                 come with custom implementations and a lot of hardware
>                 to support it natively.
> 
> 
>             Ok, the proposal is about increasing precision. One could
>             also hope that a single fma operation is faster than
>             separate addition and multiplication but as far as I know,
>             fma can even be slower since it has more data dependencies.
> 
>                 I think the proposal is rather straightforward, and
>                 should be noncontroversial. To wit, we shall add a new
>                 method to the RealFloat class:
> 
>                   class (RealFrac a, Floating a) => RealFloat a where
>                       ...
>                       fma :: a -> a -> a -> a
> 
> 
> 
>             RealFloat excludes Complex.
> 
> 
>                 There should be no default definitions; as an incorrect
>                 (two-rounding version) would essentially beat the
>                 purpose of having fma in the first place.
> 
> 
>             I just read again the whole expm1 thread and default
>             implementations with possible loss of precision seem to be
>             the best option. This way, one can mechanically replace all
>             occurrences of (x*y+z) by (fma x y z) and will not make
>             anything worse. Types with a guaranteed high precision
>             should be put in a Fused class.
> 
> 
>                 While the name "fma" is well-established in the
>                 arithmetic/hardware community and in the C-library, we
>                 can also go with "fusedMultiplyAdd," if that is deemed
>                 more clear.
> 
> 
>             Although I like descriptive names, the numeric classes
>             already contain mostly abbreviations (abs, exp, sin, tanh,
>             ...) Thus I would prefer the abbreviation for consistency.
>             Btw. in DSP 56002 the same operation is called MAC
>             (multiply-accumulate).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20150504/f4b1f97b/attachment.sig>