[Haskell-cafe] Re: String vs ByteString

Tue Aug 17 16:00:11 EDT 2010

On Tue, Aug 17, 2010 at 9:30 PM, Donn Cave <donn at avvanta.com> wrote:

> Quoth John Millikin <jmillikin at gmail.com>,
>
> > Ruby, which has an enormous Japanese userbase, solved the problem by
> > essentially defining Text = (Encoding, ByteString), and then
> > re-implementing text logic for each encoding. This allows very
> > efficient operation with every possible encoding, at the cost of
> > increased complexity (caching decoded characters, multi-byte handling,
> > etc).
>
> Ruby actually comes from the CJK world in a way, doesn't it?
>
> Even if efficient per-encoding manipulation is a tough nut to crack,
> it at least avoids the fixed cost of bulk decoding, so an application
> designer doesn't need to  think about the pay-off for a correct text
> approach vs. `binary'/ASCII, and the language/library designer doesn't
> need to think about whether genome data is a representative case etc.
>

Remember that the cost of decoding is O(n) no matter what encoding is used
internally as you always have to validate when going from  ByteString to
Text. If the external and internal encoding don't match then you also have
to copy the bytes into a new buffer, but that is only one allocation (a
pointer increment with a semi-space collector) and the copy is cheap since
the data is in cache.

-- Johan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100817/8926d893/attachment.html