[Haskell-cafe] Data.ByteStream.Char8.words performance
Dino Morelli
dino at ui3.info
Fri Mar 30 15:38:27 EDT 2007
I noticed something about ByteStream performance that I don't
understand.
I have a test text document:
$ ls -sh test-text-file
956K test-text-file
Running this program, using the Prelude's IO functions:
> module Main where
>
> main = do
> content <- readFile "test-text-file"
> let l = length . words $ content
> print l
I get:
$ time ./a.out
174372
real 0m0.805s
user 0m0.720s
sys 0m0.008s
Running a version of the same thing using Data.ByteStream.Char8:
> module Main where
>
> import qualified Data.ByteString.Char8 as B
>
> main = do
> content <- B.readFile "test-text-file"
> let l = length . B.words $ content
> print l
I see a time that is quite a bit slower:
$ time ./a.out
174372
real 0m1.864s
user 0m1.596s
sys 0m0.012s
Changing it to incorporate similar code to the implementation of
B.words:
> module Main where
>
> import qualified Data.ByteString.Char8 as B
> import Data.Char (isSpace)
>
> main = do
> content <- B.readFile "test-text-file"
> let l = length $ filter (not . B.null) $ B.splitWith isSpace
> content
> print l
I see a similar time as with B.words:
$ time ./a.out
174372
real 0m1.835s
user 0m1.628s
sys 0m0.012s
And then if I change this to use B.split ' ' instead of isSpace:
> module Main where
>
> import qualified Data.ByteString.Char8 as B
>
> main = do
> content <- B.readFile "test-text-file"
> let l = length $ filter (not . B.null) $ B.split ' ' content
> print l
I get a time that's much more reasonable-looking, compared to the
original Prelude.words version:
$ time ./a.out
174313
real 0m0.389s
user 0m0.312s
sys 0m0.004s
It seems like the B.splitWith isSpace code is really slow for some
reason. Anybody have any idea what's going on? The actual implementation
is using isSpaceWord8 which is a case statement looking for a pile of
different whitespace characters.
--
.~. Dino Morelli
/V\ email: dino at ui3.info
/( )\ irc: dino-
^^-^^ preferred distro: Debian GNU/Linux http://www.debian.org
More information about the Haskell-Cafe
mailing list