[Haskell-cafe] Streaming bytes and performance
Konstantin Litvinenko
to.darkangel at gmail.com
Tue Mar 19 21:49:50 CET 2013
On 03/19/2013 10:32 PM, Don Stewart wrote:
> Oh, I forgot the technique of inlining the lazy bytestring chunks, and
> processing each chunk seperately.
>
> $ time ./fast
> 4166680
> ./fast 1.25s user 0.07s system 99% cpu 1.325 total
>
> Essentially inline Lazy.foldlChunks and specializes is (the inliner
> should really get that).
> And now we have a nice unboxed inner loop, which llvm might spot:
>
> $ ghc -O2 -funbox-strict-fields fast.hs --make -fllvm
> $ time ./fast
> 4166680
> ./fast 1.07s user 0.06s system 98% cpu *1.146 total*
>
> So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :)
Thanks Don, but after some investigation I came to conclusion that
problem is in State monad
{-# LANGUAGE BangPatterns #-}
import Control.Monad.State.Strict
data S6 = S6 !Int !Int
main_6 = do
let r = evalState go (S6 10000 0)
print r
where
go = do
(S6 i a) <- get
if (i == 0) then return a else (put (S6 (i - 1) (a + i))) >> go
main_7 = do
let r = go (S6 10000 0)
print r
where
go (S6 i a)
| i == 0 = a
| otherwise = go $ S6 (i - 1) (a + i)
main = main_6
main_6 doing constant allocations while main_7 run in constant space.
Can you suggest something that improve situation? I don't want to
manually unfold all my code that I want to be fast :(.
More information about the Haskell-Cafe
mailing list