Proposal: fix Enum Double instance

Fri Jul 5 15:50:31 CEST 2013

Hello All,

I would like to make a counterproposal in the Enum Double debate: Instead of 
deprecating or removing the instances, how about just fixing them?

A perfect instance for Enum Double is not possible, because arithmetic is 
inexact. But you can actually get awfully close. I.e. instead of allowing the 
final value to be at most step/2 past the end, we can allow it to be at most 
about 2e-16*step past the end. In many practical applications this is close 
enough to not be a problem.

Now for some more detail on the design:

* First of all, Haskell's enumFromThenTo is stupid, because calculating 
step=then-from destroys numerical accuracy. So I will focus on implementing a 
function enumFromStepTo. You can still implement enumFromThenTo on top of it. 
But it might also make sense to expose enumFromStepTo.

* It is possible to write an exact instance for Rational. It is IMO unacceptable 
that Haskell currently does not use this correct instance. I will use this 
Rational instance as a baseline for comparison.

     instance EnumFromStepTo Rational where
       enumFromStepTo f s t
         | s >= 0 && f <= t = f : enumFromStepTo (f + s) s t
         | s <  0 && f >= t = f : enumFromStepTo (f + s) s t
         | otherwise = []

* Writing a loop naively, by recursing with from' = from+step will result in 
accumulating the error of the addition. On the other hand, Doubles can represent 
integer numbers exactly up to 2^53, so it is better to use values from+i*step.

* Assume for simplicity that step>0. The stopping condition will then be of the 
form  from+i*step-to > 0. When this quantity is 0 then the final value should be 
included, otherwise it shouldn't be.

* Now for an error analysis. Assume that we start with
    f,s,t :: Rational
   and call
    let f' = fromRational f
    let s' = fromRational s
    let t' = fromRational t
    enumFromThenTo f' s' t'

  The fromRational function rounds a rational number to the nearest 
representable Double. The relative error in f' is therefore bounded by abs 
(f-f') ≤ ε * abs f, where ε is the half the maximum distance between two 
adjacent doubles, which is about 1.11e-16. similarly for s' and t'. the result 
of an addition or multiplication operation will also be rounded to the nearest 
representable Double.

  So, the total error in our stopping condition is bounded by
  err(f+i*s-t)
    ≤ ε * abs f             -- representing f
    + ε * i * abs s         -- representing s, amplified by i
    + ε * i * abs s         -- error in calculating (*), i * s
    + ε * abs (f + i*s)     -- error in calculating (+), f + (i*s)
    + ε * abs t             -- representing t
    + ε * abs (f + i*s - t) -- error in calculating (-)

  Note that the last term of the bound is the thing we are bounding. So we can 
handle that by picking a slightly larger epsilon, eps = ε/(1-ε).

  This leads to the following implementation of enumFromStepTo:

     enumFromStepToEps eps !f !s !t = go 0
       where
       go i
         | s >= 0 && x <= t = x : go (i+1)
         | s <  0 && x >= t = x : go (i+1)
         | abs (x - t) < eps * bound = [x]
         | otherwise = []
         where
         x = f + i * s
         bound = abs f + 2 * abs (i * s) + abs t + abs x

   with eps = 1.12e-16 :: Double
   or   eps = 5.97e-8 :: Float

I have done extensive experiments with this implementation and many variants, 
and I believe it will work in all cases.

Now for enumFromTo, i.e. step=1 we could make a special case, since we know that 
step is exactly equal to 1, so there is no representation error. We could get 
rid of the multiplication, and replace i*s by 0 in the error calculation.

I think it would be best if we use the more general

     enumFromStepToEps' !eps !f !s !sErr !t = go 0
       where
       go i
         | s >= 0 && x <= t = x : go (i+1)
         | s <  0 && x >= t = x : go (i+1)
         | abs (x - t) < eps * bound = [x]
         | otherwise = []
         where
         x = f + i * s
         bound = abs f + abs (i * s) + abs (i * sErr) + abs t + abs x

    enumFromStepTo f s t = enumFromStepToEps' eps f s s t
    enumFromTo     f   t = enumFromStepToEps' eps f 1 0 t
    enumFromThenTo f h t = enumFromStepToEps' eps f (h - f)
                               (abs h + abs f + abs (h - f)) t

I have also looked at what other language implementations do. So far I have 
found that:
  * Octave uses a similar method, with the bound
      3 * double_epsilon * max (abs x) (abs t),
    which is less tight than my bound.
    see 
http://hg.savannah.gnu.org/hgweb/octave/file/787de2f144d9/liboctave/array/Range.cc#l525 

  * numpy just uses an array with length ceil((to-from)/step)
    see 
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L2742
    It therefore suffers from numerical inaccuracies:
    $ python
    >>> from numpy import arange
    >>> notone = 9/7. - 1/7. - 1/7.
    >>> notone
    1.0000000000000002
    >>> len(arange(1,1))
    0
    >>> len(arange(1,notone))
    1

In summary:
  * change Enum Rational to the always correct implementation
  * change Enum Double and Enum Float to the almost-always correct 
implementation propose above.
  * (optional) expose the enumFromStepTo function.

Twan