FFI-free NaN checks? (isDoubleNan and friends)
Mateusz Kowalczyk
fuuzetsu at fuuzetsu.co.uk
Tue Mar 6 11:30:28 UTC 2018
On 03/06/2018 10:43 AM, Brandon Allbery wrote:
> I'd in general expect good C code to optimize a little better; and in
> particular, decomposing an IEEE float is almost certainly more expensive in
> Haskell than in C, because unions let you cheat. (And I recall looking at
> the implementation of decodeFloat once; it's significantly longer than that
> C.) But I have to wonder if that code would be better done with something
> more native; the implementation may be a portable default, and you might be
> able to find something x86-specific that is faster.
There's a https://c9x.me/x86/html/file_module_x86_id_316.html that the
‘d /= d’ way compiles to. I suppose maybe I could just keep using that
and fall back onto isDoubleNaN if __FAST_MATH__ is set…
> On Tue, Mar 6, 2018 at 5:35 AM, Mateusz Kowalczyk <fuuzetsu at fuuzetsu.co.uk>
> wrote:
>
>> On 03/05/2018 10:23 PM, Brandon Allbery wrote:
>>> If the FFI version is done with "safe", consider using "unsafe" instead.
>>> There are technical reasons why this is slightly incorrect, but unless
>>> you're fiddling with the CPU's FP control flags they're mostly irrelevant
>>> and you can treat isNaN as pure and non-side-effectful, significantly
>>> reducing the overhead. You may also be able to use "ccall" to take
>>> advantage of C compiler level optimizations, or simply to directly
>> invoke a
>>> CPU-based test with asm(); but you'll need to hide that in a C
>> preprocessor
>>> #define, so that it looks syntactically like a function call to the FFI.
>>>
>>> (One of the technical reasons is that various OSes have been known to
>>> introduce bugs in their FP register and state handling across system
>> calls,
>>> in which case the "safe" version may turn "complete FP chaos" into merely
>>> "wrong answer". It's your call whether, or which side, of this bothers
>> you.)
>>
>> Perhaps I was a little unclear. The FFI-using isDoubleNaN is something
>> GHC does!
>>
>> ```
>> libraries/base/GHC/Float.hs:foreign import ccall unsafe "isDoubleNaN"
>> isDoubleNaN :: Double -> Int
>> ```
>> ```
>> HsInt
>> isDoubleNaN(HsDouble d)
>> {
>> union stg_ieee754_dbl u;
>>
>> u.d = d;
>>
>> return (
>> u.ieee.exponent == 2047 /* 2^11 - 1 */ && /* Is the exponent all
>> ones? */
>> (u.ieee.mantissa0 != 0 || u.ieee.mantissa1 != 0)
>> /* and the mantissa non-zero? */
>> );
>> }
>> ```
>>
>> My question is whether it could do better by not doing FFI and instead
>> computing natively and if not, why not?
>>
>>> On Mon, Mar 5, 2018 at 4:53 PM, Mateusz Kowalczyk <
>> fuuzetsu at fuuzetsu.co.uk>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Recently at a client I was profiling some code and isDoubleNaN lit up.
>>>> We were checking a lot of doubles for NaN as that's what customer would
>>>> send in.
>>>>
>>>> I went to investigate and I found that FFI is used to achieve this. I
>>>> was always under the impression that FFI costs a little. I had at the
>>>> time replaced the code with a hack with great results:
>>>>
>>>> ```
>>>> isNaN' :: Double -> Bool
>>>> isNaN' d = d /= d
>>>> ```
>>>>
>>>> While this worked and provided good speedup in my case, this fails
>>>> catastrophically if the program is compiled with -ffast-math. This is
>>>> expected. I have since reverted it. Seeking an alternative solution I
>>>> have thought about re-implementing the C code with a native Haskell
>>>> version: after all it just checks a few bits. Apparently unsafeCoerce#
>>>> and friends were a big no-no but I found
>>>> https://phabricator.haskell.org/D3358 . I have implemented the code at
>>>> the bottom of this post. Obviously it's missing endianness (compile-time
>>>> switch).
>>>>
>>>> This seems to be faster for smaller `mkInput` list than Prelude.isNaN
>>>> but slower slightly on the one below. The `/=` version is the fastest
>>>> but very fragile.
>>>>
>>>> My question to you all is whether implementing a version of this
>>>> function in Haskell makes sense and if not, why not? The
>>>> stgDoubleToWord64 is implemented in CMM and I don't know anything about
>>>> the costs of that.
>>>>
>>>> * Is there a cheaper alternative to FFI way?
>>>> * If yes, does anyone know how to write it such that it compiles to same
>>>> code but without the call overhead? I must have failed below as it's
>>>> slower on some inputs.
>>>>
>>>> Basically if a faster way exists for isNaN, something I have to do a
>>>> lot, I'd love to hear about it.
>>>>
>>>> I leave you with basic code I managed to come up with. 8.4.x only.
>>>>
>>>>
>>>> ```
>>>> {-# LANGUAGE MagicHash #-}
>>>> {-# OPTIONS_GHC -O2 -ddump-simpl -ddump-stg -ddump-to-file -ddump-asm
>> #-}
>>>> module Main (main) where
>>>>
>>>> import GHC.Float
>>>> import GHC.Prim
>>>>
>>>> isNaN' :: Double -> Bool
>>>> isNaN' d = d /= d
>>>>
>>>> isNaNBits :: Double -> Bool
>>>> isNaNBits (D# d) = case (bits `and#` expMask) `eqWord#` expMask of
>>>> 1# -> case bits `and#` mantissaMask of
>>>> 0## -> False
>>>> _ -> True
>>>> _ -> False
>>>> where
>>>> bits :: Word#
>>>> bits = stgDoubleToWord64 d
>>>>
>>>> expMask, mantissaMask :: Word#
>>>> expMask = 0x7FF0000000000000##
>>>> mantissaMask = 0x000FFFFFFFFFFFFF##
>>>>
>>>> main :: IO ()
>>>> main = sumFilter isNaN {-isNaN'-} {-isNaNBits-} (mkInput 100000000)
>>>> `seq` pure ()
>>>> where
>>>> nan :: Double
>>>> nan = log (-1)
>>>>
>>>> mkInput :: Int -> [Double]
>>>> mkInput n = take n $ cycle [1, nan]
>>>>
>>>> sumFilter :: (Double -> Bool) -> [Double] -> Double
>>>> sumFilter p = Prelude.sum . Prelude.filter (not . p)
>>>> ```
>>>>
>>>> --
>>>> Mateusz K.
>>>> _______________________________________________
>>>> ghc-devs mailing list
>>>> ghc-devs at haskell.org
>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>>>
>>>
>>>
>>>
>>
>>
>> --
>> Mateusz K.
>>
>
>
>
--
Mateusz K.
More information about the ghc-devs
mailing list