[Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
Peter Lund
firefly at vax64.dk
Sat Dec 15 10:19:22 EST 2007
On Sat, 2007-12-15 at 14:34 +0000, Duncan Coutts wrote:
> Ok, I presume this is a guessing game and we're supposed to just look at
> the code without running and timing them.
Precisely :)
> > All they do is read from stdin and count the number of spaces they see.
> > There are two that use strict bytestrings, two that use lazy
> > bytestrings, and two that use the standard Haskell strings. Three use a
> > recursive function with an accumulator parameter and three use a foldl
> > with a lambda function.
> >
> > Say the fastest one takes the time 1. How much time will the others
> > take?
> >
> > And how about memory? How much memory do you think they require? Let's
> > say we feed a 150MB(*) file into each of them, how many megabytes do you
> > think they end up using (as seen from the OS, not in terms of how big
> > the live heap is)?
> >
> > I'm going to post full benchmarks + analysis on Wednesday.
>
> Right'o. I'll have a go. Lets see if I can't embarrass myself with being
> completely inaccurate.
Thanks for biting!
You were, thankfully, only almost completely inaccurate ;)
> > PS: For extra credit, what do you think is the peak memory use for this
> > program when given an input file of 150MB?
> Hmm. So that should work in constant memory, a few 64 chunks at once.
> I'd expect this to be pretty fast.
You are right about the speed.
Can you guess a number in kilobytes?
> >
> >
> >
> > ==============================
> > hs/space-bs-c8-acc-1.hs:
> > {-# LANGUAGE BangPatterns #-}
> >
> > import qualified Data.ByteString.Char8 as B
> >
> > cnt :: Int -> B.ByteString -> Int
> > cnt !acc bs = if B.null bs
> > then acc
> > else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs)
> >
> > main = do s <- B.getContents
> > print (cnt 0 s)
>
> This uses strict bytestrings so will use at least 150Mb and that'll make
> it a good deal slower. In fact it'll be worse than that since
> getContents does not know in advance how big the input will be so it has
> to play the doubling and copying game. So it'll end up copying all the
> data roughly twice. cnt is strict and tail recursive so that shouldn't
> be any problem, though it's probably not as fast as the first length .
> filter since head, tail, null all have to do bounds checks.
You are right about the memory. It is actually slightly faster than the
"extra credit" (length/filter combination) above.
> > ==============================
> > hs/space-bslc8-acc-1.hs:
> > {-# LANGUAGE BangPatterns #-}
> >
> > import qualified Data.ByteString.Lazy.Char8 as B
> >
> > cnt :: Int -> B.ByteString -> Int
> > cnt !acc bs = if B.null bs
> > then acc
> > else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs)
> >
> > main = do s <- B.getContents
> > print (cnt 0 s)
>
> For the same reason as above, I'd expect this cnt to be slower than
> B.length . B.filter (== ' ')
It is slower but not for the same reason as above.
> > ==============================
> > hs/space-xxxxx-acc-1.hs:
> > {-# LANGUAGE BangPatterns #-}
> >
> > cnt :: Int -> String -> Int
> > cnt !acc bs = if null bs
> > then acc
> > else cnt (if head bs == ' ' then acc+1 else acc) (tail bs)
> >
> > main = do s <- getContents
> > print (cnt 0 s)
>
> Lazy, so constant memory use, but much higher constant factors due to
> using String.
Spot on.
> > ==============================
> > hs/space-bs-c8-foldlx-1.hs:
> > {-# LANGUAGE BangPatterns #-}
> >
> > import qualified Data.ByteString.Char8 as B
> >
> > cnt :: B.ByteString -> Int
> > cnt bs = B.foldl' (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
> >
> > main = do s <- B.getContents
> > print (cnt s)
>
> This is of course still strict so that's going to make the reading slow.
Nope.
> This is a manually fused B.length . B.filter (== ' ') which hopefully is
> the same speed as the automatically fused one if the fusion is working
> ok. If not, then the B.length . B.filter (== ' ') will be doing a extra
> copy, and memory writes are expensive.
>
> > ==============================
> > hs/space-bslc8-foldlx-1.hs:
> > {-# LANGUAGE BangPatterns #-}
> >
> > import qualified Data.ByteString.Lazy.Char8 as B
> >
> > cnt :: B.ByteString -> Int
> > cnt bs = B.foldl' (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
> >
> > main = do s <- B.getContents
> > print (cnt s)
>
> As above but now in constant memory space.
Nope.
> > ==============================
> > hs/space-xxxxx-foldl.hs:
> > {-# LANGUAGE BangPatterns #-}
> >
> > cnt :: String -> Int
> > cnt bs = foldl (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
> >
> > main = do s <- getContents
> > print (cnt s)
>
> Oh, no! not foldl that's a killer.
You think it's worse than the program just above?
>
> Ok, so best way to summarise I think is to organise by data type since I
> think that'll dominate.
>
> So I think the lazy bytestring versions will be fastest due to having
> the best memory access patterns and doing the least copying. I think the
> foldl's will be faster than the explicit accumulators due to having
> fewer bounds checks.
>
> space-bslc8-foldlx-1
> space-bslc8-acc-1
>
> space-bs-c8-foldlx-1
> space-bs-c8-acc-1
>
> space-xxxxx-acc-1
> space-xxxxx-foldl
>
> I'll try guessing at some ratios:
>
> 1.0 space-bslc8-foldlx-1
> 1.1 space-bslc8-acc-1
>
> 2.0 space-bs-c8-foldlx-1
> 2.1 space-bs-c8-acc-1
>
> 4.0 space-xxxxx-acc-1
> 15 space-xxxxx-foldl
I've done the measurements on a 2GHz Athlon64 3000+, a 1667 MHz Core
Duo, and a 600MHz Pentium III. They all show the same pattern (with a
few minor aberrations for the PIII).
You did got one of the relative speeds right ;)
(and some of the memory usages)
I've tested with ghc 6.8.1 and 6.9.20071119 and 6.9.20071208 (or
thereabouts). 6.6.1 won't run my benchmarks and it also won't let me
install bytestring-0.9.0.1 to replace its built-in version.
-Peter
More information about the Haskell-Cafe
mailing list