[Haskell-cafe] Optimizations for list comprehension
Claude Heiland-Allen
claude at mathr.co.uk
Wed Aug 21 13:45:49 UTC 2019
Hi,
On 20/08/2019 20:12, Emilio Francesquini wrote:
> Thanks William!
>
> For me it's quite unexpected to see a unit argument making that kind
> of difference for the optimizer...
>
> Anyway, one new trick added to the bag...
>
> Tks.
>
>
>
> On Mon, Aug 19, 2019 at 11:34 AM William Yager <will.yager at gmail.com
> <mailto:will.yager at gmail.com>> wrote:
>
> I'm not sure which exact optimizations are responsible, but based
> on --ddump-simple,
>
> * "inside" is not allocating any lists at all. It's just a couple
> loops over unboxed ints
> * "outside" is actually allocating a (single) list data structure
> and has an inner loop and an outer loop, both of which traverse
> the list
>
> GHC seems to be too aggressive about sharing "range" in "outside".
> Adding a unit argument to "range" makes both functions go fast.
>
In "outside", "range" is a single (monomorphic) CAF binding, so I would
expect it to be shared aggressively.
Even if you do something like "inside" with explicit naming:
inside2 :: Int -> Int
inside2 n =
sum [i + j | i <- range1, j <- range2]
where
range1 = [0..n-1]
range2 = [0..n-1]
then "range2" is shared between all iterations over "range1", which I
expect to be allocated and traversed. I haven't tested to be sure.
Historically I have used crass hacks to prevent sharing by introducing
data dependencies:
inside3 :: Int -> Int
inside3 n =
sum [i + j | i <- [0..n-1], j <- [i+0-i..n-1]]
That "inside" can be optimized to avoid allocations of the inner loop is
a pleasant surprise, I hope it doesn't lead to unpleasant surprises in
other circumstances.
>
> On Mon, Aug 19, 2019 at 8:14 PM Emilio Francesquini
> <francesquini at gmail.com <mailto:francesquini at gmail.com>> wrote:
>
> Hello Cafe,
>
> While investigating a performance problem I stumbled upon what
> I eventually reduced to the example below:
>
> module Main where
>
> import Data.Time.Clock
>
> outside :: Int -> Int
> outside n =
> sum [i + j | i <- range, j <- range]
> where
> range = [0..n-1]
>
> inside :: Int -> Int
> inside n =
> sum [i + j | i <- [0..n-1], j <- [0..n-1]]
>
> main :: IO ()
> main = do
> t0 <- getCurrentTime
> print $ inside 10000
> t1 <- getCurrentTime
> print $ outside 10000
> t2 <-getCurrentTime
>
> print (diffUTCTime t1 t0)
> print (diffUTCTime t2 t1)
>
> Compiling with -O2, up to GHC 8.2.2, both `inside` and
> `outside` functions would take the same amount of time to
> execute. Somewhere between GHC 8.2.2 and 8.6.4 something
> changed (possibly some new optimization) making `inside` run
> ~4x faster on my machine. With LLVM the difference is even bigger.
>
> It is not that `outside` got slower, but that `inside` got
> much faster. I'm curious to what optimizations might be
> happening to the `inside` function that would not fire on the
> outside function.
>
> Any hints?
>
Claude
--
https://mathr.co.uk
More information about the Haskell-Cafe
mailing list