[Haskell-cafe] implementing RealFloat decodeFloat
Claude Heiland-Allen
claude at goto10.org
Wed Dec 22 15:09:29 CET 2010
Hi everyone,
I've been working on [0] Haskell bindings for [1] libqd for [2]
double-double and quad-double arithmetic, and have been struggling to
implement [3] RealFloat, in particular [4] decodeFloat, mostly because
of its postcondition but also some issues relating to variable precision:
---8<---
If decodeFloat x yields (m,n), then x is equal in value to m*b^^n, where
b is the floating-point radix, and furthermore, either m and n are both
zero or else b^(d-1) <= m < b^d, where d is the value of floatDigits x.
In particular, decodeFloat 0 = (0,0).
---8<---
(BTW: that should probably really be "... <= abs m < ..."; perhaps this
code could be added to the report and/or documentation, if it is correct:
validDecode :: RealFloat f => f -> Bool
validDecode f = case decodeFloat f of
(0,0) -> True
(m, e) -> b^(d-1) <= abs m && abs m < b^d
where
b = floatRadix f
d = floatDigits f
)
The double-double and quad-double types do not have a fixed precision,
though they do have a fixed minimum precision: this means that
decodeFloat will (in general) lose some precision. This is because for
example in:
data DoubleDouble = DoubleDouble !CDouble !CDouble
dd = DoubleDouble a b
the semantics are that "dd = a + b", with |a|>>|b|; coupled with the
IEEE implicit leading 1 bit, this means that there may be large gaps
between exponents: for example: "1 + 0.5**100 :: DoubleDouble".
So far I've got a "mostly working" implementation thus:
decodeFloat !(DoubleDouble a b) =
case (decodeFloat a, decodeFloat b) of
((0, 0), (0, 0)) -> (0, 0)
((0, 0), (m, e)) -> (m `shiftL` f, e - f)
((m, e), (0, 0)) -> (m `shiftL` f, e - f)
((m1, e1), (m2, e2)) ->
let fixup m e =
if m < mMin
then fixup (m `shiftL` 1) (e - 1)
else if m >= mMax
then fixup (m `shiftR` 1) (e + 1)
else (m, e)
mMin = 1 `shiftL` (ff - 1)
mMax = 1 `shiftL` ff
ff = floatDigits (0 :: DoubleDouble)
g = e1 - e2 - f
in fixup ((m1 `shiftL` f) + (m2 `shiftR` g)) (e1 - f)
where
f = floatDigits (0 :: CDouble)
This does meet the postcondition as specified (which leads to breakage
in other RealFloat methods), but has a recursion with no termination
proof (so far), and is lossy in general:
> let check f = uncurry encodeFloat (decodeFloat f) == f
> check (1 + 0.5 ** 120 :: DoubleDouble)
False
It does however seem to meet a weaker condition:
> let check2 f = (decodeFloat . (`asTypeOf` f) . uncurry encodeFloat
. decodeFloat $ f) == decodeFloat f
> check2 (1 + 0.5 ** 120 :: DoubleDouble)
True
Questions:
1. Is this weaker condition likely to be good enough in practice?
2. Can my implementation of decodeFloat be simplified?
Thanks for any insights,
Claude
[0] http://hackage.haskell.org/package/qd
[1] http://crd.lbl.gov/~dhbailey/mpdist/
[2] http://crd.lbl.gov/~dhbailey/dhbpapers/arith15.pdf
[3]
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelude.html#t:RealFloat
[4]
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelude.html#v:decodeFloat
--
http://claudiusmaximus.goto10.org
More information about the Haskell-Cafe
mailing list