[Haskell-cafe] Streaming bytes and performance

Gregory Collins greg at gregorycollins.net
Mon Mar 18 13:14:36 CET 2013


Put a bang pattern on your accumulator in "go". Since the value is not
demanded until the end of the program, you're actually just building up a
huge space leak there.

Secondly, unconsing from the lazy bytestring will cause a lot of allocation
churn in the garbage collector -- each byte read in the input forces the
creation of a new "L.ByteString", which is many times larger.

Also please consider trying the "io-streams" library that I wrote (
http://hackage.haskell.org/package/io-streams). It provides primitives for
streaming IO in "basic Haskell" style. To provide a Word8 stream (which is
probably a bad idea performance-wise) it would be most efficient
allocation-wise to implement a mutable index cursor (i.e. IORef Int) that
pointed to your current position within the ByteString chunk, other
strategies will probably allocate too much.

G



On Mon, Mar 18, 2013 at 9:53 AM, Konstantin Litvinenko <
to.darkangel at gmail.com> wrote:

> Hi All!
>
> I tune my toy project for performance and hit the wall on simple, in
> imperative world, task. Here is the code that model what I'm trying to
> achieve
>
> import qualified Data.ByteString.Lazy as L
> import Data.Word8(isSpace)
> import Data.Word
> import Control.Monad.State
>
> type Stream = State L.ByteString
>
> get_byte :: Stream (Maybe Word8)
> get_byte = do
>     s <- get
>     case L.uncons s of
>         Nothing -> return Nothing
>         Just (x, xs) -> put xs >> return (Just x)
>
> main = do
>     f <- L.readFile "test.txt"
>     let r = evalState count_spaces f
>     print r
>   where
>     count_spaces = go 0
>       where
>         go a = do
>             x <- get_byte
>             case x of
>                 Just x' ->  if isSpace x' then go (a + 1) else go a
>                 Nothing -> return a
>
> It takes the file and count spaces, in imperative way, consuming bytes one
> by one. The problem is: How to rewrite this to get rid of constant
> allocation of state but still working with stream of bytes? I can rewrite
> this as one-liner L.foldl, but that doesn't help me in any way to optimize
> my toy project where all algorithms build upon consuming stream of bytes.
>
> PS. My main lang is C++ over 10 years and I only learn Haskell :)
>
>
> ______________________________**_________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe>
>



-- 
Gregory Collins <greg at gregorycollins.net>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130318/34703eea/attachment.htm>


More information about the Haskell-Cafe mailing list