[Haskell-cafe] Bytestrings vs String?
wren ng thornton
wren at freegeek.org
Mon Feb 2 22:41:57 EST 2009
Marc Weber wrote:
> A lot of people are suggesting using Bytestrings for performance,
> strictness whatsoever reasons.
>
> However how well do they talk to other libraries?
I'm not sure how you mean?
For passing them around: If someone's trying to combine your library
(version using ByteStrings) and another Haskell library that uses
ByteStrings, then everything works fine--- assuming both libraries are
compiled against the same version of the bytestring library. As I
recall, ByteStrings are designed to ease passing to C code across the
FFI too, in case someone wants to use your library with some FFI C code.
If someone's trying to combine your library with another library that
uses String, they'll need to add conversions. (All of this is symmetric
for a version of your library using String with another library using
ByteStrings.)
The big compatibility issue I can see is the question of what a given
ByteString *means*. In particular, via the Data.ByteString.Char8 module
it encodes only ASCII characters, not all of Unicode like [Char] does.
There are libraries for lossless encoding of [Char] into ByteStrings,
but in general there can be encoding mismatch problems if, say, your
library uses UTF8-encoded ByteStrings but the other library treats them
like Char8-encoded (or UTF16BE, UTF16LE, FooBar,...), potentially
mangling or hallucinating multi-byte characters.
In general, if you're concerned about performance (or believe your users
will be) then ByteStrings are a good bet. Just make it clear in the
documentation what sort of encoding you use (or whether your library is
encoding agnostic).
For hslogger specifically, it looks like most of the Strings are
arguments which will typically be written as literals. Thus, to minimize
boilerplate, if you do switch to ByteStrings then you may want to
provide a module that does all the String->ByteString conversions for
the user. If you have a good program for testing real world use of
hslogger, before committing to the change I'd suggest benchmarking (in
time and in space) the differences between the current String
implementation and a proposed ByteString implementation.
> Should there be two versions?
>
> hslogger-bytestring and hslogger-string?
I'd just stick with one (with a module for hiding the conversions, as
desired). Duplicating the code introduces too much room for maintenance
and compatibility issues.
> Or would it be better to implement one String class which can cope
> with everthing (performance will drop, won't it?)
It'd be a very large class if you do it generally[1], and large classes
like that are generally frowned on (for good or ill). If you only need a
small subset of string operations then it may be more feasible to have a
smaller class with only those operations.
[1] See everything hidden from the Prelude in
http://hackage.haskell.org/packages/archive/list-extras/0.2.2.1/doc/html/src/Prelude-Listless.html
or see what all is offered by Data.ByteString vs the Prelude.
> In the future I'd like to explore using haskell for web developement.
> So speed does matter. And I don't want my server to convert from
> Bytestrings to Strings and back multiple times..
That's the big thing. The more people that use ByteStrings the less need
there is to convert when combining libraries. That said, ByteStrings
aren't a panacea; lists and laziness are very useful.
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list