Proposal: Add {to,from}Strict conversion functions between strict and lazy ByteStrings
Herbert Valerio Riedel
hvr at gnu.org
Sat Oct 29 18:10:58 CEST 2011
On Sat, 2011-10-29 at 14:18 +0200, Bas van Dijk wrote:
> Any idea why "toStrict/simple #1" is faster than "toStrict/optimized
> #1"?
tbh, I missed #1 being actually slower... Thx for pointing that out!
The reason is simply, that B.concat performs zero-copying for 0 and 1
chunks:
-- | /O(n)/ Concatenate a list of ByteStrings.
concat :: [ByteString] -> ByteString
concat [] = empty
concat [ps] = ps
concat xs = ...
I've added those zero-copy optimizations to the optimized toStrict2, and
now all timings are better than for the naive implementation toStrict1:
--------------------------------------------------------------------------
{-# LANGUAGE OverloadedStrings #-}
import Criterion
import Criterion.Main
import qualified Data.ByteString as B
import qualified Data.ByteString.Internal as BI
import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString.Lazy.Internal as BLI
import Foreign.ForeignPtr
import Foreign.Ptr
toStrict1 :: BL.ByteString -> B.ByteString
toStrict1 = B.concat . BL.toChunks
toStrict2 :: BL.ByteString -> B.ByteString
toStrict2 BLI.Empty = B.empty
toStrict2 (BLI.Chunk c BLI.Empty) = c
toStrict2 lb = BI.unsafeCreate len $ go lb
where
len = BLI.foldlChunks (\l sb -> l + B.length sb) 0 lb
go BLI.Empty _ = return ()
go (BLI.Chunk (BI.PS fp s l) r) ptr =
withForeignPtr fp $ \p -> do
BI.memcpy ptr (p `plusPtr` s) (fromIntegral l)
go r (ptr `plusPtr` l)
main :: IO ()
main = do
let lbs0 = ""
lbs1 = "abcdefghij"
lbs2 = BL.fromChunks (replicate 2 "abcdefghij")
lbs3 = BL.fromChunks (replicate 10 "abcdefghij")
lbs4 = BL.fromChunks (replicate 1000 "abcdefghij")
print $ toStrict1 lbs0 == toStrict2 lbs0
print $ toStrict1 lbs1 == toStrict2 lbs1
print $ toStrict1 lbs2 == toStrict2 lbs2
print $ toStrict1 lbs3 == toStrict2 lbs3
print $ toStrict1 lbs4 == toStrict2 lbs4
defaultMain
[ bgroup "toStrict"
[ bench "simple #0" $ whnf toStrict1 lbs0
, bench "simple #1" $ whnf toStrict1 lbs1
, bench "simple #2" $ whnf toStrict1 lbs2
, bench "simple #3" $ whnf toStrict1 lbs3
, bench "simple #4" $ whnf toStrict1 lbs4
, bench "optimized #0" $ whnf toStrict2 lbs0
, bench "optimized #1" $ whnf toStrict2 lbs1
, bench "optimized #2" $ whnf toStrict2 lbs2
, bench "optimized #3" $ whnf toStrict2 lbs3
, bench "optimized #4" $ whnf toStrict2 lbs4
]
]
{-
True
True
True
True
True
warming up
estimating clock resolution...
mean is 2.537877 us (320001 iterations)
found 2658 outliers among 319999 samples (0.8%)
2292 (0.7%) high severe
estimating cost of a clock call...
mean is 55.38578 ns (15 iterations)
benchmarking toStrict/simple #0
mean: 17.66316 ns, lb 17.62767 ns, ub 17.74077 ns, ci 0.950
std dev: 258.0479 ps, lb 146.2057 ps, ub 417.6999 ps, ci 0.950
benchmarking toStrict/simple #1
mean: 28.59188 ns, lb 28.49765 ns, ub 28.70322 ns, ci 0.950
std dev: 523.0005 ps, lb 415.2360 ps, ub 688.4154 ps, ci 0.950
benchmarking toStrict/simple #2
mean: 144.5600 ns, lb 144.2192 ns, ub 145.2525 ns, ci 0.950
std dev: 2.397734 ns, lb 1.352302 ns, ub 3.905039 ns, ci 0.950
benchmarking toStrict/simple #3
mean: 488.0094 ns, lb 486.6532 ns, ub 490.3121 ns, ci 0.950
std dev: 8.903734 ns, lb 6.046690 ns, ub 13.68234 ns, ci 0.950
benchmarking toStrict/simple #4
mean: 55.65404 us, lb 55.43386 us, ub 55.97341 us, ci 0.950
std dev: 1.342695 us, lb 989.9903 ns, ub 1.836054 us, ci 0.950
benchmarking toStrict/optimized #0
mean: 14.14306 ns, lb 14.10655 ns, ub 14.20415 ns, ci 0.950
std dev: 237.0362 ps, lb 159.7752 ps, ub 347.7329 ps, ci 0.950
benchmarking toStrict/optimized #1
mean: 19.10087 ns, lb 19.05273 ns, ub 19.18831 ns, ci 0.950
std dev: 322.9545 ps, lb 201.9111 ps, ub 503.2965 ps, ci 0.950
benchmarking toStrict/optimized #2
mean: 63.34285 ns, lb 63.21118 ns, ub 63.59386 ns, ci 0.950
std dev: 903.1160 ps, lb 543.0056 ps, ub 1.377267 ns, ci 0.950
benchmarking toStrict/optimized #3
mean: 166.5292 ns, lb 166.2405 ns, ub 167.1715 ns, ci 0.950
std dev: 2.144647 ns, lb 1.259903 ns, ub 3.408946 ns, ci 0.950
benchmarking toStrict/optimized #4
mean: 13.05338 us, lb 13.02102 us, ub 13.11728 us, ci 0.950
std dev: 222.8512 ns, lb 116.9145 ns, ub 343.6604 ns, ci 0.950
-}
More information about the Libraries
mailing list