[Haskell-cafe] Re: String vs ByteString

Wed Aug 18 10:20:00 EDT 2010

On 18 August 2010 15:04, Michael Snoyman <michael at snoyman.com> wrote:

> For me, the whole point of this discussion was to
> determine whether we should attempt porting to UTF-8, which as I understand
> it would be a rather large undertaking.

And the answer to that is, yes but only if we have good reason to
believe it will actually be faster, and that's where we're most
interested in benchmarks rather than hand waving.

As Johan and others have said, the original choice to use UTF16 was
based on benchmarks showing it was faster (than UTF8 or UTF32). So if
we want to counter that then we need either to argue that these were
the wrong choice of benchmarks that do not reflect real usage, or that
with better implementations that the balance would shift.

Now there is an interesting argument to claim that we spend more time
shovling strings about than we do actually processing them in any
interesting way and therefore that we should pick benchmarks that
reflect that. This would then shift the balance to favour the internal
representation being identical to some particular popular external
representation --- even if that internal representation is slower for
many processing tasks.

Duncan