[Haskell-cafe] Streaming bytes and performance
Don Stewart
dons00 at gmail.com
Tue Mar 19 20:36:03 CET 2013
Just for fun. Here's some improvements. about 6x faster.
I'd be interested to see what io-streams could do on this.
Using a 250M test file.
-- strict state monad and bang patterns on the uncons and accumulator
argument:
$ time ./A
4166680
./A 8.42s user 0.57s system 99% cpu 9.037 total
-- just write a loop
$ time ./A
4166680
./A 3.84s user 0.26s system 99% cpu 4.121 total
-- switch to Int
$ time ./A
4166680
./A 1.89s user 0.23s system 99% cpu 2.134 total
-- custom isSpace function
$ time ./A
4166680
./A 1.56s user 0.24s system 99% cpu 1.808 total
-- mmap IO
$ time ./A
4166680
./A 1.54s user 0.09s system 99% cpu 1.636 total
Here's the final program:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy.Char8 as L
import System.IO.Posix.MMap.Lazy
main = do
f <- unsafeMMapFile "test.txt"
print $ go 0 f
where
go :: Int -> L.ByteString -> Int
go !a !s = case L.uncons s of
Nothing -> a
Just (x,xs) | isSpaceChar8 x -> go (a+1) xs
| otherwise -> go a xs
isSpaceChar8 c = c == '\n' || c == ' '
{-# INLINE isSpaceChar8 #-}
On Mon, Mar 18, 2013 at 8:53 AM, Konstantin Litvinenko <
to.darkangel at gmail.com> wrote:
> Hi All!
>
> I tune my toy project for performance and hit the wall on simple, in
> imperative world, task. Here is the code that model what I'm trying to
> achieve
>
> import qualified Data.ByteString.Lazy as L
> import Data.Word8(isSpace)
> import Data.Word
> import Control.Monad.State
>
> type Stream = State L.ByteString
>
> get_byte :: Stream (Maybe Word8)
> get_byte = do
> s <- get
> case L.uncons s of
> Nothing -> return Nothing
> Just (x, xs) -> put xs >> return (Just x)
>
> main = do
> f <- L.readFile "test.txt"
> let r = evalState count_spaces f
> print r
> where
> count_spaces = go 0
> where
> go a = do
> x <- get_byte
> case x of
> Just x' -> if isSpace x' then go (a + 1) else go a
> Nothing -> return a
>
> It takes the file and count spaces, in imperative way, consuming bytes one
> by one. The problem is: How to rewrite this to get rid of constant
> allocation of state but still working with stream of bytes? I can rewrite
> this as one-liner L.foldl, but that doesn't help me in any way to optimize
> my toy project where all algorithms build upon consuming stream of bytes.
>
> PS. My main lang is C++ over 10 years and I only learn Haskell :)
>
>
> ______________________________**_________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130319/d7edeede/attachment.htm>
More information about the Haskell-Cafe
mailing list