[Haskell-cafe] Streaming bytes and performance

Tue Mar 19 21:49:50 CET 2013

On 03/19/2013 10:32 PM, Don Stewart wrote:
> Oh, I forgot the technique of inlining the lazy bytestring chunks, and
> processing each chunk seperately.
>
> $ time ./fast
> 4166680
> ./fast  1.25s user 0.07s system 99% cpu 1.325 total
>
> Essentially inline Lazy.foldlChunks and specializes is (the inliner
> should really get that).
> And now we have a nice unboxed inner loop, which llvm might spot:
>
> $ ghc -O2 -funbox-strict-fields fast.hs  --make -fllvm
> $ time ./fast
> 4166680
> ./fast  1.07s user 0.06s system 98% cpu *1.146 total*
>
> So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :)

Thanks Don, but after some investigation I came to conclusion that 
problem is in State monad

{-# LANGUAGE BangPatterns #-}

import Control.Monad.State.Strict

data S6 = S6 !Int !Int

main_6 = do
     let r = evalState go (S6 10000 0)
     print r
   where
     go = do
         (S6 i a) <- get
         if (i == 0) then return a else (put (S6 (i - 1) (a + i))) >> go

main_7 = do
     let r = go (S6 10000 0)
     print r
   where
     go (S6 i a)
         | i == 0 = a
         | otherwise = go $ S6 (i - 1) (a + i)

main = main_6

main_6 doing constant allocations while main_7 run in constant space. 
Can you suggest something that improve situation? I don't want to 
manually unfold all my code that I want to be fast :(.