[Haskell-cafe] Analyzing slow performance of a Haskell program
Johan Tibell
johan.tibell at gmail.com
Tue Aug 9 19:44:23 CEST 2011
Hi Chris,
On Tue, Aug 9, 2011 at 12:47 PM, Chris Yuen <kizzx2+haskell at gmail.com> wrote:
> 1. Why are bangs needed on the length arrays?
>
> If I remove them from below, performance drops 10%. I thought `unsafeIndex`
> is straight in both arguments, no?
>
> wordLength i = go i
> where
> go n
> | n < 10 = lengthOnes !! n
> | n < 20 = lengthTeens !! (n-10)
> | n < 100 = (lengthTens !! (n // 10)) + (lengthOnes !! (n % 10))
> | n < 1000 = (lengthOnes !! (n // 100)) + 7 + go (n % 100)
> | n < 1000000 = go (n // 1000) + 8 + go (n % 1000)
> | otherwise = go (n // 1000000) + 7 + go (n % 1000000)
> !lengthOnes = lengthVec ones
> !lengthTens = lengthVec tens
> !lengthTeens = lengthVec teens
(It's "strict", not "straight".)
The different lengths are not used in all branches and since Haskell
is a lazy (or to be pendantic: non-strict) language we cannot compute
them before knowing which branch will be evaluated. For example, given
that we have
ones = ...
tens = error "Boom!"
test = wordLength 0
evaluating 'test' should not cause an exception to be raised as the
first (n < 10) branch is taken, but it would if lengthOnes was strict.
Delaying the evaluation has some costs, namely allocating a thunk for
e.g. `lengthVec ones` and later evaluate that thunk. By making the
lengths strict we can evaluate them earlier and avoid some allocation
and forcing of thunks.
> 2. Why the single element worker wrapper pattern (`go` functions) increases
> performance?
>
> If we change wordLength to
>
> wordLength n
> | n < 10 = lengthOnes !! n
> | n < 20 = lengthTeens !! (n-10)
> | n < 100 = (lengthTens !! (n // 10)) + (lengthOnes !! (n % 10))
> | n < 1000 = (lengthOnes !! (n // 100)) + 7 + wordLength (n % 100)
> | n < 1000000 = wordLength (n // 1000) + 8 + wordLength (n % 1000)
> | otherwise = wordLength (n // 1000000) + 7 + wordLength (n % 1000000)
> where
> !lengthOnes = lengthVec ones
> !lengthTens = lengthVec tens
> !lengthTeens = lengthVec teens
>
> The performance drops by another 10%. This really surprised me. `go i`
> seemed obvious to me and I don't understand how it could make any
> difference. The full source code is available to GHC so it shouldn't be
> related to call-by-pointer problem? If this is the case, shouldn't we always
> wrap a "go" function for **any** recursive functions?
Making wordLength non-recursive lets GHC inline it, which can
sometimes help performance (e.g. if the inlining enables more
optimizations). Inlining does increase code size (and sometimes
allocation if a closure has to be allocated to capture free
variables), so it's not always a good idea.
Cheers,
Johan
More information about the Haskell-Cafe
mailing list