[Haskell-cafe] Bytestrings and [Char]

Iustin Pop iusty at k1024.org
Tue Mar 23 13:25:19 EDT 2010


On Tue, Mar 23, 2010 at 01:21:49PM -0400, Nick Bowler wrote:
> On 18:11 Tue 23 Mar     , Iustin Pop wrote:
> > I agree with the principle of correctness, but let's be honest - it's
> > (many) orders of magnitude between ByteString and String and Text, not
> > just a few percentage points…
> > 
> > I've been struggling with this problem too and it's not nice. Every time
> > one uses the system readFile & friends (anything that doesn't read via
> > ByteStrings), it hell slow.
> > 
> > Test: read a file and compute its size in chars. Input text file is
> > ~40MB in size, has one non-ASCII char. The test might seem stupid but it
> > is a simple one. ghc 6.12.1.
> > 
> > Data.ByteString.Lazy (bytestring readFile + length) - < 10 miliseconds,
> > incorrect length (as expected).
> > 
> > Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
> > seconds, correct length.
> > 
> > Data.Text.Lazy (system readFile + pack + length) - 26s, correct length.
> > 
> > String (system readfile + length) - ~1 second, correct length.
> 
> Is this a mistake?  Your own report shows String & readFile being an
> order of magnitude faster than everything else, contrary to your earlier
> claim.

No, it's not a mistake. String is faster than pack to Text and length, but it's
100 times slower than ByteString.

My whole point is that difference between byte processing and char processing
in Haskell is not a few percentage points, but order of magnitude. I would
really like to have only the 6x penalty that Python shows, for example.

regards,
iustin


More information about the Haskell-Cafe mailing list