[Haskell] Bug in the HXT: Data.Char.UTF8.decodeOne
Uwe Schmidt
uwe at fh-wedel.de
Mon Jun 11 05:49:49 EDT 2007
I've got a bug report concerning the UTF decoding
in HXT. I've copied the source containing the bug from the
Haskell Internationalisation Working Group.
I guess this source is also used in other
projects, e.g. darcs.
My question: Is this really a bug or is it a feature.
My knowlege so far was, the intervall from
E000 to FFFF is not legal in unicode.
---------- Forwarded Message ----------
Subject: Bug in the HXT: Data.Char.UTF8.decodeOne
Date: Sunday 10 June 2007 07:40
From: PHO <phonohawk at ps.sakura.ne.jp>
To: hxmltoolbox at fh-wedel.de
Hello,
I've found a bug in Data.Char.UTF8.decodeOne that it fails to decode
UTF-8 letters from U+E000 to U+FFFF. Here is the patch:
{
hunk ./src/Data/Char/UTF8.hs 248
- | b1 < 0xEE = decodeOne_threebyte bs
+ | b1 < 0xF0 = decodeOne_threebyte bs
}
--------------------------------------------------------------
here is the source
http://darcs.fh-wedel.de/hxt/src/Data/Char/UTF8.hs
Any suggestions?
Uwe Schmidt
More information about the Haskell
mailing list