[Haskell-cafe] Analyzing slow performance of a Haskell program

Tue Aug 9 19:44:23 CEST 2011

Hi Chris,

On Tue, Aug 9, 2011 at 12:47 PM, Chris Yuen <kizzx2+haskell at gmail.com> wrote:
> 1. Why are bangs needed on the length arrays?
>
> If I remove them from below, performance drops 10%. I thought `unsafeIndex`
> is straight in both arguments, no?
>
> wordLength i = go i
>   where
>     go n
>       | n < 10 = lengthOnes !! n
>       | n < 20 = lengthTeens !! (n-10)
>       | n < 100 = (lengthTens !! (n // 10)) + (lengthOnes !! (n % 10))
>       | n < 1000 = (lengthOnes !! (n // 100)) + 7 + go (n % 100)
>       | n < 1000000 = go (n // 1000) + 8 + go (n % 1000)
>       | otherwise = go (n // 1000000) + 7 + go (n % 1000000)
>     !lengthOnes = lengthVec ones
>     !lengthTens = lengthVec tens
>     !lengthTeens = lengthVec teens

(It's "strict", not "straight".)

The different lengths are not used in all branches and since Haskell
is a lazy (or to be pendantic: non-strict) language we cannot compute
them before knowing which branch will be evaluated. For example, given
that we have

    ones = ...
    tens = error "Boom!"

    test = wordLength 0

evaluating 'test' should not cause an exception to be raised as the
first (n < 10) branch is taken, but it would if lengthOnes was strict.

Delaying the evaluation has some costs, namely allocating a thunk for
e.g. `lengthVec ones` and later evaluate that thunk. By making the
lengths strict we can evaluate them earlier and avoid some allocation
and forcing of thunks.

> 2. Why the single element worker wrapper pattern (`go` functions) increases
> performance?
>
> If we change wordLength to
>
> wordLength n
>   | n < 10 = lengthOnes !! n
>   | n < 20 = lengthTeens !! (n-10)
>   | n < 100 = (lengthTens !! (n // 10)) + (lengthOnes !! (n % 10))
>   | n < 1000 = (lengthOnes !! (n // 100)) + 7 + wordLength (n % 100)
>   | n < 1000000 = wordLength (n // 1000) + 8 + wordLength (n % 1000)
>   | otherwise = wordLength (n // 1000000) + 7 + wordLength (n % 1000000)
>   where
>     !lengthOnes = lengthVec ones
>     !lengthTens = lengthVec tens
>     !lengthTeens = lengthVec teens
>
> The performance drops by another 10%. This really surprised me. `go i`
> seemed obvious to me and I don't understand how it could make any
> difference. The full source code is available to GHC so it shouldn't be
> related to call-by-pointer problem? If this is the case, shouldn't we always
> wrap a "go" function for **any** recursive functions?

Making wordLength non-recursive lets GHC inline it, which can
sometimes help performance (e.g. if the inlining enables more
optimizations). Inlining does increase code size (and sometimes
allocation if a closure has to be allocated to capture free
variables), so it's not always a good idea.

Cheers,
Johan