UTF8 libraries

shelarcy shelarcy at gmail.com
Fri Feb 2 10:40:14 EST 2007


Hi Alistar,

On Fri, 02 Feb 2007 21:01:04 +0900, Alistair Bayley <alistair at abayley.org> wrote:
> What is the state of UTF8 support in Haskell libraries (base or
> user-contributed)? I had a need for a UTF8 en & de-coder for Takusen,
> and after looking around couldn't find anything particularly
> satisfactory, so ended up writing (yet another) one.

regex-posix doesn't support UTF8. Because regex-posix uses POSIX
regex. So this problem can't fixed by only correct UTF8 en & de-coder.


If someone is interested in suppourting UTF8, I recommend to
use oniguruma.

http://www.geocities.jp/kosako3/oniguruma/

Oniguruma also supports UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
etc .... And it is portable, it's available both on Unix and
Windows.

So I think it is best regex C library to choose backend.

-- 
shelarcy <shelarcy    capella.freemail.ne.jp>
http://page.freett.com/shelarcy/


More information about the Libraries mailing list