[Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
Don Stewart
dons at galois.com
Sun Dec 16 18:21:57 EST 2007
I've had a look at how some of the code was being compiled for
strict and lazy bytestrings, and also which rules weren't firing.
With some small tweaks the code seems back in good shape.
An updated bytestring library is at :
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-0.9.0.2
Enjoy! :)
------------------------------------------------------------------------
Summary: the suspicious lazy bytestring program works now. (constant
space, and fastest overall, as expected originally)
Program 1, lazy bytestring length . filter
Yesterday:
./A +RTS -sstderr < 150M 1.01s user 0.10s system 98% cpu 1.123 total
40M allocated
* Today (fixed!):
./A +RTS -sstderr < 150M 0.26s user 0.06s system 96% cpu 0.332 total
2M allocated
Reason, deprecated array fusion mucking up the optimiser.
I think we can close this regression.
------------------------------------------------------------------------
Also, I had a look at Program 3: lazy bytestring, custom loop
Unchanged. 2.4s, constant space. This was a bit slow.
Further investigation shows lots of unnecessary bounds checks, as we
take apart the Chunk lazy bytestring type, then test and continue.
This representation was chosen to make it possible to process chunks
efficiently, so that we can avoid these bounds check.
Something like this instead:
cnt :: Int -> B.ByteString -> Int
cnt n B.Empty = n
cnt n (B.Chunk x xs) = cnt (n + cnt_strict 0 x) xs -- process lazy spine
-- now we can process a chunk without checking for Empty
where
cnt_string !i !s -- then strict chunk
| S.null s = i
| c == ' ' = cnt_strict (i+1) t
| otherwise = cnt_strict i t
where
(c,t) = (S.w2c (S.unsafeHead s), S.unsafeTail s) -- no bounds check
main = do s <- B.getContents; print (cnt 0 s)
Let's us avoid redundant checks for Empty, while allowing 'go' to
avoid unnecessary checks for the empty strict bytestring. This is
some 4x faster.
This alternating between lazy spines and strict chunk processing is
the best way to get reliable performance from lazy bytestring custom loops.
-- Don
More information about the Haskell-Cafe
mailing list