ghc more aggressive with Ints?

Tue, 7 Jan 2003 09:57:29 -0800 (PST)

Hi,

This is somewhat related to an earlier post about streaming by Jorge, and
to a similar post by me a bunch of months back regarding multiple ways of
writing a similar function.

Many times when processing data which you need to process in more than
once way, you have two options: (1) process it in one go and store
intermediate results as, eg, tuples; (2) process it in more than one
go.

I've been told that the latter is the more "haskelly" way to do it, but I
don't completely buy that.  Especially in cases like Jorge's, since the
data will not be able to be garbage collected between runs and thus the
whole program will require a much larger stack/heap/whatever.

In either case, consider the following (semantically
equivalent) functions:

> sumMinusOnce l = foldr sumMinus (0,0) l
>     where sumMinus c (a,b) = (a+c, c-b)
>
> sumMinusTwice l = (foldr (+) 0 l, foldr (-) 0 l)

The first takes one pass over the data, while the second takes two.  Now,
we can give these functions various type signatures, Int, Integer, Float
or Double (more, of course, but these are the basics).  When the list is
[1..250000], the timing results look like:

        | ONCE                   | TWICE
--------+------------------------+------------------------
INT     | 1.81u 0.18s 0:02.07    | 1.84u 0.15s 0:01.99
INTEGER | 9.77u 0.28s 0:10.29    | 5.07u 0.07s 0:05.23
FLOAT   | 8.60u 0.30s 0:09.08    | 3.79u 0.16s 0:04.07
DOUBLE  | 9.50u 0.25s 0:09.82    | 4.25u 0.16s 0:04.44

Here, we see that for everything but INT, the ONCE version is about two
times slower.  If we pump the input size up to [1..1000000], then the
times for INT become

INT     | 16.38u 0.66s 0:17.22   | 20.75u 0.34s 0:21.33

Here, there's a much smaller difference in speed.  For the other versions
(say DOUBLE), the ONCE version is consistently two times slower.  This
raises the question as to why.

Looking at the -ddump-simpl output, we can see why.  In the sumMinusOnce
version, when given an Int type signature, the tuple gets
unboxed and the primitive GHC.Prim.+# and GHC.Prim.-# functions are
used.  In the one where we've given a Double type signature, we don't get
this.  The tuple still gets unboxed, but the non-unboxed functions
GHC.Float.plusDouble and GHC.Float.minusDouble functions are used instead.

Perhaps more importantly, the Double version is left as a recursive call,
looking something like:

> Main.$wgo = \w ->
>   case w of
>     (y:ys) -> case Main.$wgo ys of
>                 (# ww1, ww2 #) ->
>                    (# plusDouble ww1 y, minusDouble y ww2) #)
>    []     -> (# 0, 0 #)

(converted by hand from simpl-core to a more Haskelly syntax)

However, the Int version becomes a very tight loop in which the list is
deforested.

My understanding was that using list comprehensions like [1..100000]
together with foldr you would always get build/fold deforestation, but
this only seems to happen when the list is a list of Ints, not a list of
Doubles.

If we change the Once function to:

> i2d :: Int -> Double
> i2d = fromInteger.toInteger
>
> sumMinusOnce :: [Int] -> (Double, Double)
> sumMinusOnce l = foldr sumMinus (0,0) l
>     where sumMinus c (a,b) = (a+i2d c, (i2d c)-b)

then we do get full deforestation as we want.  This new version, when run
with [1..250000], yields a time of

4.12u 0.23s 0:04.44

which is actually faster than the TWICE/DOUBLE version.

So...why is GHC more aggressive with Ints than with Doubles and what are
the additional conditions under which fold/build deforestation occurs?

 - Hal

--
Hal Daume III

 "Computer science is no more about computers    | hdaume@isi.edu
  than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume