[Haskell-cafe] Bytestrings and [Char]

Thomas DuBuisson thomas.dubuisson at gmail.com
Tue Mar 23 15:19:59 EDT 2010


BOS:
> Well, your benchmarks are highly suspect.

Attached is another benchmark with similar results.  This is no
criterion benchmark but I did try to adjust a wee bit for cache
issues.  Suffice to say I am not yet impressed with Data.Text
performance wise.

In the broader scope I feel there is a deeper point here:
Data.ByteString is how most data is available in a program (be it from
a file, network, or other library/device).  With all this data we
really should have some sort of zero-copy "safeCoerce" that can expose
the bytestring as an O(1) unboxed array of some safe length.  I know
this would have been nice for pureMD5, which did use an ugly
unsafePerformIO hack just to get Word16s but even those got boxed up
at horrible cost (it now uses 'cereal' to get the Word16 - even worse
performance but it lets me ignore endianess of the architecture).


---- OT ----
Beating the dead horse: I once wrote and deleted a blog post ranting
about how to get Word16 in C vs in Haskell:
   word = ((uint16_t *)p)[index];
or
  word = unsafePerformIO $ withForeignPtr ptr $ \ptr' -> let p =
castPtr (plusPtr ptr' off) in peekElemOff p index

That Haskell snippet takes significantly longer to both write and run.
---- END OT ----


Code and benchmark numbers included below, all complaints about the
accuracy of the benchmark are welcome.

NOTE: tLog is a daily #haskell log file repeated many times ~ 59MB.
-------
[tommd at Mavlo Test]$ ./read tLog
Normal String + System.IO "61443120": 1.467924s
Data.ByteString.Lazy "61450365": 0.023285s
Data.ByteString.Lazy.UTF8 "61443120": 3.305154s
Data.Text.Lazy "61443120": 3.99178s

----- CODE -----
import Data.Time
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Lazy.UTF8 as U
import qualified Data.Text.Lazy as T
import qualified Data.Text.Lazy.IO as TO
import System.IO
import System.Environment (getArgs)
import Control.Monad (sequence_, when)

main = do
        [file] <- getArgs
        let time (f, desc)  = do
                s <- getCurrentTime
                r <- f
                let !r' = r
                t <- getCurrentTime
                let d = diffUTCTime t s
                when (length desc > 0)
                    (putStrLn $ desc ++ " " ++ show r' ++ ": " ++ show d)
            ops = [
                    (readFile file >>= return . show . length, "")
                  , (readFile file >>= return . show . length, "Normal
String + System.IO")
                  , (L.readFile file >>= return . show . L.length, "")
                  , (L.readFile file >>= return . show . L.length,
"Data.ByteString.Lazy")
                  , (L.readFile file >>= return . show . U.length, "")
                  , (L.readFile file >>= return . show . U.length,
"Data.ByteString.Lazy.UTF8")
                  , (TO.readFile file >>= return . show . T.length, "")
                  , (TO.readFile file >>= return . show . T.length,
"Data.Text.Lazy")
                  ]
        mapM_ time ops


More information about the Haskell-Cafe mailing list