[Haskell-cafe] Streaming bytes and performance
Nicolas Trangez
nicolas at incubaid.com
Tue Mar 19 21:53:23 CET 2013
On Tue, 2013-03-19 at 20:32 +0000, Don Stewart wrote:
> Oh, I forgot the technique of inlining the lazy bytestring chunks, and
> processing each chunk seperately.
>
> $ time ./fast
> 4166680
> ./fast 1.25s user 0.07s system 99% cpu 1.325 total
>
> Essentially inline Lazy.foldlChunks and specializes is (the inliner should
> really get that).
> And now we have a nice unboxed inner loop, which llvm might spot:
>
> $ ghc -O2 -funbox-strict-fields fast.hs --make -fllvm
> $ time ./fast
> 4166680
> ./fast 1.07s user 0.06s system 98% cpu *1.146 total*
>
> So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :)
You could try something like this using Conduit:
{-# LANGUAGE BangPatterns #-}
module Main (main) where
import Data.Conduit
import qualified Data.Conduit.List as L
import qualified Data.Conduit.Binary as B
import qualified Data.ByteString.Char8 as BS8
main :: IO ()
main = print =<< runResourceT (
B.sourceFile filename $$ L.fold (\(!a) (!b) -> a + BS8.count ' ' b)
(0 :: Int))
where
filename = ...
Nicolas
>
> {-# LANGUAGE BangPatterns #-}
>
> import Data.ByteString.Internal
> import Data.ByteString.Unsafe
> import qualified Data.ByteString.Char8 as S
> import qualified Data.ByteString.Lazy.Char8 as L
> import qualified Data.ByteString.Lazy.Internal as L
> import System.IO.Posix.MMap.Lazy
>
> main = do
> f <- unsafeMMapFile "test.txt"
> print . new 0 $ L.toChunks f
>
> new :: Int -> [ByteString] -> Int
> new i [] = i
> new i (x:xs) = new (add i x) xs
>
> -- jump into the fast path
> {-# INLINE add #-}
> add :: Int -> ByteString -> Int
> add !i !s | S.null s = i
> | isSpace' x = add (i+1) xs
> | otherwise = add i xs
> where T x xs = uncons s
>
> data T = T !Char ByteString
>
> uncons s = T (w2c (unsafeHead s)) (unsafeTail s)
>
> isSpace' c = c == '\n' || c == ' '
> {-# INLINE isSpace' #-}
>
>
>
>
> On Tue, Mar 19, 2013 at 7:36 PM, Don Stewart <dons00 at gmail.com> wrote:
>
> > Just for fun. Here's some improvements. about 6x faster.
> > I'd be interested to see what io-streams could do on this.
> >
> > Using a 250M test file.
> >
> > -- strict state monad and bang patterns on the uncons and accumulator
> > argument:
> >
> > $ time ./A
> > 4166680
> > ./A 8.42s user 0.57s system 99% cpu 9.037 total
> >
> > -- just write a loop
> >
> > $ time ./A
> > 4166680
> > ./A 3.84s user 0.26s system 99% cpu 4.121 total
> >
> > -- switch to Int
> >
> > $ time ./A
> > 4166680
> > ./A 1.89s user 0.23s system 99% cpu 2.134 total
> >
> > -- custom isSpace function
> >
> > $ time ./A
> > 4166680
> > ./A 1.56s user 0.24s system 99% cpu 1.808 total
> >
> > -- mmap IO
> >
> > $ time ./A
> > 4166680
> > ./A 1.54s user 0.09s system 99% cpu 1.636 total
> >
> > Here's the final program:
> >
> >
> > {-# LANGUAGE BangPatterns #-}
> >
> > import qualified Data.ByteString as S
> > import qualified Data.ByteString.Lazy.Char8 as L
> > import System.IO.Posix.MMap.Lazy
> >
> > main = do
> > f <- unsafeMMapFile "test.txt"
> > print $ go 0 f
> > where
> > go :: Int -> L.ByteString -> Int
> > go !a !s = case L.uncons s of
> > Nothing -> a
> > Just (x,xs) | isSpaceChar8 x -> go (a+1) xs
> > | otherwise -> go a xs
> >
> > isSpaceChar8 c = c == '\n' || c == ' '
> > {-# INLINE isSpaceChar8 #-}
> >
> >
> > On Mon, Mar 18, 2013 at 8:53 AM, Konstantin Litvinenko <
> > to.darkangel at gmail.com> wrote:
> >
> >> Hi All!
> >>
> >> I tune my toy project for performance and hit the wall on simple, in
> >> imperative world, task. Here is the code that model what I'm trying to
> >> achieve
> >>
> >> import qualified Data.ByteString.Lazy as L
> >> import Data.Word8(isSpace)
> >> import Data.Word
> >> import Control.Monad.State
> >>
> >> type Stream = State L.ByteString
> >>
> >> get_byte :: Stream (Maybe Word8)
> >> get_byte = do
> >> s <- get
> >> case L.uncons s of
> >> Nothing -> return Nothing
> >> Just (x, xs) -> put xs >> return (Just x)
> >>
> >> main = do
> >> f <- L.readFile "test.txt"
> >> let r = evalState count_spaces f
> >> print r
> >> where
> >> count_spaces = go 0
> >> where
> >> go a = do
> >> x <- get_byte
> >> case x of
> >> Just x' -> if isSpace x' then go (a + 1) else go a
> >> Nothing -> return a
> >>
> >> It takes the file and count spaces, in imperative way, consuming bytes
> >> one by one. The problem is: How to rewrite this to get rid of constant
> >> allocation of state but still working with stream of bytes? I can rewrite
> >> this as one-liner L.foldl, but that doesn't help me in any way to optimize
> >> my toy project where all algorithms build upon consuming stream of bytes.
> >>
> >> PS. My main lang is C++ over 10 years and I only learn Haskell :)
> >>
> >>
> >> ______________________________**_________________
> >> Haskell-Cafe mailing list
> >> Haskell-Cafe at haskell.org
> >> http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe>
> >>
> >
> >
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list