equivalent of EXPLAIN PLAN with GHC?

Sun Aug 2 03:50:13 EDT 2009

joshua:
> Hello, I'm quite new to Haskell, but experienced in other languages (C,
> Python, Ruby, SQL, etc).  I am interested in Haskell because I've heard
> that the language is capable of lots of optimizations based on laziness,
> and I want to learn more about that.
> 
> I dug in with Project Euler problem #1, and wrote:
> 
> main = print (show (sum [x | x <- [3..999], x `mod` 3 == 0 || x `mod` 5 == 0]))
> 
> So far so good, but I want to have some way of observing what
> optimizations GHC has performed.  Most notably, in this example I want
> to know if the list was ever actually constructed in memory.  The "sum"
> function only needs the elements one at a time, in order, so if Haskell
> and GHC are everything I've heard about them, I would fully expect the
> list construction to be optimized out.  :)
> 
> Unfortunately I was not able to see any way of examining ghc's output to
> determine whether it had performed this optimization.  The C it produced
> with '-fvia-C -C' was totally unreadable -- it looked like something
> from the IOCCC. :(  I couldn't find any way to match up any of the code
> it had generated with code I had written.  My attempts to objdump the
> binaries was similarly unproductive.
> 
> Is there any kind of intermediate form that a person can examine to see
> how their code is being optimized?  Anything like EXPLAIN PLAN in SQL?
> It would make it much easier to understand the kinds of optimizations
> Haskell can perform.
> 
> I'm not looking so much for profiling -- obvious this program is trivial
> and takes no time.  I just want to better understand what kind of
> optimizations are possible given Haskell's language model.
> 

So the optimization you're looking for here is fusion of some kind.
GHC ships with build/foldr fusion, and there are libraries for an
alternative system, stream fusion. 

GHC uses an intermediate representation called 'Core', which is a
mini-Haskell, essentially, that is optimized repeatedly via
type-preserving transformations. You can inspect this with a number of
tools, including "ghc-core" (available on Hackage).

Now, you're example uses a list comprehension (which is translated into
an enumFromTo call, and a call to filter. It also uses a call to sum,
which is a non-fusing left-fold under build/foldr fusion, but fuses
under stream fusion.

I'll desugar your code explicitly, and translate the calls from
build/foldr to stream-fusible functions:

    $ cabal install uvector

    import Data.Array.Vector

    main = print
         . sumU 
         . filterU (\x -> x `mod` 3 == 0 || x `mod` 5 == 0)
         $ enumFromToU 3 (999 :: Int)

Running through ghc-core we see:

    146 PreInlineUnconditionally
    320 PostInlineUnconditionally
    84 UnfoldingDone
    18 RuleFired
        7 +#
        1 -#
        1 <=#
        2 ==#->case
        1 >#
        3 SC:$wfold0
        1 fromIntegral/Int->Int
        2 streamU/unstreamU
    12 LetFloatFromLet
    1 EtaReduction
    210 BetaReduction
    8 CaseOfCase
    94 KnownBranch
    4 CaseMerge
    5 CaseElim
    6 CaseIdentity
    1 FillInCaseDefault
    18 SimplifierDone

Showing what transformations happened. Notably, 2 occurences of the "streamU/unstreamU"
transformation, to remove intermediate structures.

The final code looks like:

    $s$wfold :: Int# -> Int#
   $s$wfold =
  \ (sc_s19l :: Int#) ->
    case modInt# (-9223372036854775807) 3 of wild21_a14L {
      __DEFAULT ->
        case modInt# (-9223372036854775807) 5 of wild211_X159 {
          __DEFAULT -> $wfold sc_s19l (-9223372036854775806);
          0 ->
            $wfold
              (+# sc_s19l (-9223372036854775807)) (-9223372036854775806)
        };
      0 ->
        $wfold
          (+# sc_s19l (-9223372036854775807)) (-9223372036854775806)
    }
    $wfold :: Int# -> Int# -> Int#

    $wfold =
      \ (ww_s189 :: Int#) (ww1_s18d :: Int#) ->
        case ># ww1_s18d 999 of wild_a15N {
          False ->
            case ww1_s18d of wild2_a14K {
              __DEFAULT ->
                case modInt# wild2_a14K 3 of wild21_a14L {
                  __DEFAULT ->
                    case modInt# wild2_a14K 5 of wild211_X159 {
                      __DEFAULT -> $wfold ww_s189 (+# wild2_a14K 1);
                      0 ->
                        $wfold
                          (+# ww_s189 wild2_a14K) (+# wild2_a14K 1)
                    };
                  0 ->
                    $wfold
                      (+# ww_s189 wild2_a14K) (+# wild2_a14K 1)
                };
              (-9223372036854775808) ->
                case modInt# (-9223372036854775808) 3 of wild21_a14N {
                  __DEFAULT ->
                    case lvl_r19G of wild211_a14x {
                      False -> $s$wfold ww_s189;
                      True ->
                        $s$wfold (+# ww_s189 (-9223372036854775808))
                    };
                  0 -> $s$wfold (+# ww_s189 (-9223372036854775808))
                }
            };
          True -> ww_s189
        } 

Which might take a while to understand, but the key thing is the types 
at the top level are Int# -- they don't allocate [Int], but instead use
unboxed, machine-level int values.

Optimization succesful.

There are papers and manuals describing "GHC Core" - the intermediate
form - on haskell.org. For fusion, google for "deforestation" or "stream
fusion".

-- Don