[Haskell-cafe] Optimizations for list comprehension

Simon Peyton Jones simonpj at microsoft.com
Wed Aug 21 15:47:58 UTC 2019

I think what is happening is that

  *   In “inside” there are two distinct lists [0..n-1], each used once; they are fused with their consumers.
  *   In “outside” there is one list [0..n-1], which is used twice.  GHC is paranoid about duplicating work (and would be right to do so if producing the list was expensive), so it does not fuse the list with its two consumers.

In this case GHC’s paranoia is not justified.  It’d be better to duplicate the production of [0..n-1] so that it can fuse with its consumers.

One way to address this problem might be “cheapBuild”.   There’s a ticket about this: https://gitlab.haskell.org/ghc/ghc/issues/7206.   I see that a year ago the TL;DR was “Conclusion: let's do it. Ensuring that a Note gives a clear explanation, and points to this ticket.”   But no one yet has.

Maybe someone might see if the cheapBuild idea really does solve this particular case.


From: Haskell-Cafe <haskell-cafe-bounces at haskell.org> On Behalf Of William Yager
Sent: 19 August 2019 15:35
To: Emilio Francesquini <francesquini at gmail.com>
Cc: Haskell Cafe <haskell-cafe at haskell.org>
Subject: Re: [Haskell-cafe] Optimizations for list comprehension

I'm not sure which exact optimizations are responsible, but based on --ddump-simple,

* "inside" is not allocating any lists at all. It's just a couple loops over unboxed ints
* "outside" is actually allocating a (single) list data structure and has an inner loop and an outer loop, both of which traverse the list

GHC seems to be too aggressive about sharing "range" in "outside". Adding a unit argument to "range" makes both functions go fast.

On Mon, Aug 19, 2019 at 8:14 PM Emilio Francesquini <francesquini at gmail.com<mailto:francesquini at gmail.com>> wrote:
Hello Cafe,

While investigating a performance problem I stumbled upon what I eventually reduced to the example below:

module Main where

import Data.Time.Clock

outside :: Int -> Int
outside n =
  sum [i + j | i <- range, j <- range]
    range = [0..n-1]

inside :: Int -> Int
inside n =
  sum [i + j | i <- [0..n-1], j <- [0..n-1]]
main :: IO ()
main = do
  t0 <- getCurrentTime
  print $ inside 10000
  t1 <- getCurrentTime
  print $ outside 10000
  t2 <-getCurrentTime

  print (diffUTCTime t1 t0)
  print (diffUTCTime t2 t1)

Compiling with -O2, up to GHC 8.2.2, both `inside` and `outside` functions would take the same amount of time to execute. Somewhere between GHC 8.2.2 and 8.6.4 something changed (possibly some new optimization) making `inside` run ~4x faster on my machine. With LLVM the difference is even bigger.

It is not that `outside` got slower, but that `inside` got much faster. I'm curious to what optimizations might be happening to the `inside` function that would not fire on the outside function.

Any hints?

Best regards,


Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20190821/62a8420f/attachment.html>

More information about the Haskell-Cafe mailing list