[Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

shelarcy shelarcy at gmail.com
Mon Feb 5 09:38:50 EST 2007


Hello Twan,

On Mon, 05 Feb 2007 08:46:35 +0900, Twan van Laarhoven <twanvl at gmail.com> wrote:
> I would like to announce my attempt at making a Unicode version of
> Data.ByteString. The library is named Data.CompactString to avoid
> conflict with other (Fast)PackedString libraries.

How about add abstract layer?

Spencer Janssen tried to provied abstract layer for Unicode ByteString,
last year's summer of code project.
It has no Unicode support. But it supplied a good layer, Stringable class.

http://code.google.com/soc/haskell/appinfo.html?csaid=B934AEBE95120AB2
http://darcs.haskell.org/SoC/fps-soc/
http://darcs.haskell.org/SoC/fps-soc-aug21/


> The library uses a variable length encoding (1 to 3 bytes) of Chars into
> Word8s, which are then stored in a ByteString. The structure is very
> much based on Data.ByteString, most of the implementation is copied from
> there. Hopefully this means that fusion rules could be copied as well.

UTF-8 also uses 4 to 6 byte encodings now.
CJK Unified Ideographs Extension B, Tai Xuan Jing Symbol and Music Symbol,
etc ... use 4 byte encoding.

Many Hasekll UTF-8 libraries doesn't support over 3 byte encodings.
But Takusen's implementation support it correctly.

http://darcs.haskell.org/takusen/Foreign/C/UTF8.hs
http://www.haskell.org/pipermail/libraries/2007-February/006841.html

How about support 4 to 6 byte encodings?


Best Regards,

-- 
shelarcy <shelarcy    capella.freemail.ne.jp>
http://page.freett.com/shelarcy/


More information about the Haskell mailing list