possible tuple elimination optimization?

Sat Mar 26 20:11:20 UTC 2016

The extra allocation/time isn't caused by allocating pairs (neither
version does this). It is caused by allocating Doubles, because GHC
doesn't figure out that the second part of your pairs is always
demanded (I think), so the corresponding argument is left boxed. sumh
essentially turns into:

    sumh (D# d#) = sumh1 d#
    sumh1 d# = case loop1 1# 0# of
      (# _, ans #) -> ans
     where
     loop1 :: Double# -> Double# -> (# Double, Double #)
     loop1 i# s#
       | i# >= d# = (# D# i#, D# s# #)
       | otherwise = loop2 (i# + 2#) (D# ...)
     loop2 :: Double# -> Double -> (# Double, Double #)
     loop2 i# s
       | i# >= d# = (# D# i#, s #)
       | otherwise = loop2 (i# + 2#) (...)

So, it starts as a loop on unboxed arguments for one step, but then
switches over to boxing the second argument after the first iteration.
If you consider loop2 in isolation (and the hypothetical loop that
both of these came from), it is not strict in s. So there is no reason
to keep s evaluated except by considering the calling context, which I
guess GHC is not doing.

One way to fix this is by modifying the predicate to make is stricter.
Instead of `((>=n) . fst)` you can use:

    p (i, s) = s `seq` (i >= n)

Then it will be obvious to GHC that it shouldn't delay computing `s`
at each step.

This may be a similar failure to the old 'constructed product result
inside unboxed tuples' bug, where a function with an `ST s Int` result
could internally be optimized to return an unboxed integer, but it
isn't. Your example isn't exactly the same, but it has a similar
flavor. A call site is strict in part of the result of a function, and
generating code for that strict case would enable better optimization
of the callee, but the optimization would have to act on individual
parts of an unboxed tuple (which in your case comes from optimizing a
tuple), and that isn't done.

-- Dan

On Sat, Mar 26, 2016 at 1:50 PM, George Colpitts
<george.colpitts at gmail.com> wrote:
> The following higher order function, sumh, seems to be 3 to 14 times slower
> than the equivalent recursive function, sumr:
>
> sumh :: Double -> Double
> sumh n =
>     snd $ until ((>= n) . fst) as' (1, 0)
>     where
>       as' (i,s) =
>           (i + 2, s + (-1) / i + 1 / (i + 1))
>
> sumr :: Double -> Double
> sumr n =
>     as' 1 0
>     where
>       as' i  s
>           | i >= n    = s
>           | otherwise = as' (i + 2) (s + (-1) / i + 1 / (i + 1))
>
> This is true in 7.10.3 as well as 8.0.1 so this is not a regression. From
> the size usage my guess is that this is due to the allocation of tuples in
> sumh. Maybe there is a straightforward way to optimize sumh but I couldn't
> find it. Adding a Strict pragma didn't help nor did use of
> -funbox-strict-fields -flate-dmd-anal. Have I missed something or should I
> file a bug?
>
> Timings from 8.0.1 rc2:
>
> ghc --version
> The Glorious Glasgow Haskell Compilation System, version 8.0.0.20160204
> bash-3.2$ ghc -O2 -dynamic sum.hs
> ghc -O2 -dynamic sum.hs
> [1 of 1] Compiling Main             ( sum.hs, sum.o )
> Linking sum ...
> bash-3.2$ ghci
> Prelude> :load sum
> Ok, modules loaded: Main.
> (0.05 secs,)
> Prelude Main> sumh (10^6)
> -0.6931466805602525
> it :: Double
> (0.14 secs, 40,708,016 bytes)
> Prelude Main> sumr (10^6)
> -0.6931466805602525
> it :: Double
> (0.01 secs, 92,000 bytes)
>
> Thanks
> George
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>