equivalent of EXPLAIN PLAN with GHC?
Don Stewart
dons at galois.com
Sun Aug 2 03:50:13 EDT 2009
joshua:
> Hello, I'm quite new to Haskell, but experienced in other languages (C,
> Python, Ruby, SQL, etc). I am interested in Haskell because I've heard
> that the language is capable of lots of optimizations based on laziness,
> and I want to learn more about that.
>
> I dug in with Project Euler problem #1, and wrote:
>
> main = print (show (sum [x | x <- [3..999], x `mod` 3 == 0 || x `mod` 5 == 0]))
>
> So far so good, but I want to have some way of observing what
> optimizations GHC has performed. Most notably, in this example I want
> to know if the list was ever actually constructed in memory. The "sum"
> function only needs the elements one at a time, in order, so if Haskell
> and GHC are everything I've heard about them, I would fully expect the
> list construction to be optimized out. :)
>
> Unfortunately I was not able to see any way of examining ghc's output to
> determine whether it had performed this optimization. The C it produced
> with '-fvia-C -C' was totally unreadable -- it looked like something
> from the IOCCC. :( I couldn't find any way to match up any of the code
> it had generated with code I had written. My attempts to objdump the
> binaries was similarly unproductive.
>
> Is there any kind of intermediate form that a person can examine to see
> how their code is being optimized? Anything like EXPLAIN PLAN in SQL?
> It would make it much easier to understand the kinds of optimizations
> Haskell can perform.
>
> I'm not looking so much for profiling -- obvious this program is trivial
> and takes no time. I just want to better understand what kind of
> optimizations are possible given Haskell's language model.
>
So the optimization you're looking for here is fusion of some kind.
GHC ships with build/foldr fusion, and there are libraries for an
alternative system, stream fusion.
GHC uses an intermediate representation called 'Core', which is a
mini-Haskell, essentially, that is optimized repeatedly via
type-preserving transformations. You can inspect this with a number of
tools, including "ghc-core" (available on Hackage).
Now, you're example uses a list comprehension (which is translated into
an enumFromTo call, and a call to filter. It also uses a call to sum,
which is a non-fusing left-fold under build/foldr fusion, but fuses
under stream fusion.
I'll desugar your code explicitly, and translate the calls from
build/foldr to stream-fusible functions:
$ cabal install uvector
import Data.Array.Vector
main = print
. sumU
. filterU (\x -> x `mod` 3 == 0 || x `mod` 5 == 0)
$ enumFromToU 3 (999 :: Int)
Running through ghc-core we see:
146 PreInlineUnconditionally
320 PostInlineUnconditionally
84 UnfoldingDone
18 RuleFired
7 +#
1 -#
1 <=#
2 ==#->case
1 >#
3 SC:$wfold0
1 fromIntegral/Int->Int
2 streamU/unstreamU
12 LetFloatFromLet
1 EtaReduction
210 BetaReduction
8 CaseOfCase
94 KnownBranch
4 CaseMerge
5 CaseElim
6 CaseIdentity
1 FillInCaseDefault
18 SimplifierDone
Showing what transformations happened. Notably, 2 occurences of the "streamU/unstreamU"
transformation, to remove intermediate structures.
The final code looks like:
$s$wfold :: Int# -> Int#
$s$wfold =
\ (sc_s19l :: Int#) ->
case modInt# (-9223372036854775807) 3 of wild21_a14L {
__DEFAULT ->
case modInt# (-9223372036854775807) 5 of wild211_X159 {
__DEFAULT -> $wfold sc_s19l (-9223372036854775806);
0 ->
$wfold
(+# sc_s19l (-9223372036854775807)) (-9223372036854775806)
};
0 ->
$wfold
(+# sc_s19l (-9223372036854775807)) (-9223372036854775806)
}
$wfold :: Int# -> Int# -> Int#
$wfold =
\ (ww_s189 :: Int#) (ww1_s18d :: Int#) ->
case ># ww1_s18d 999 of wild_a15N {
False ->
case ww1_s18d of wild2_a14K {
__DEFAULT ->
case modInt# wild2_a14K 3 of wild21_a14L {
__DEFAULT ->
case modInt# wild2_a14K 5 of wild211_X159 {
__DEFAULT -> $wfold ww_s189 (+# wild2_a14K 1);
0 ->
$wfold
(+# ww_s189 wild2_a14K) (+# wild2_a14K 1)
};
0 ->
$wfold
(+# ww_s189 wild2_a14K) (+# wild2_a14K 1)
};
(-9223372036854775808) ->
case modInt# (-9223372036854775808) 3 of wild21_a14N {
__DEFAULT ->
case lvl_r19G of wild211_a14x {
False -> $s$wfold ww_s189;
True ->
$s$wfold (+# ww_s189 (-9223372036854775808))
};
0 -> $s$wfold (+# ww_s189 (-9223372036854775808))
}
};
True -> ww_s189
}
Which might take a while to understand, but the key thing is the types
at the top level are Int# -- they don't allocate [Int], but instead use
unboxed, machine-level int values.
Optimization succesful.
There are papers and manuals describing "GHC Core" - the intermediate
form - on haskell.org. For fusion, google for "deforestation" or "stream
fusion".
-- Don
More information about the Glasgow-haskell-users
mailing list