[Haskell] Bug in the HXT: Data.Char.UTF8.decodeOne

Uwe Schmidt uwe at fh-wedel.de
Mon Jun 11 05:49:49 EDT 2007


I've got a bug report concerning the UTF decoding
in HXT. I've copied the source containing the bug from the 
Haskell Internationalisation Working Group.
I guess this source is also used in other
projects, e.g. darcs.

My question: Is this really a bug or is it a feature.
My knowlege so far was, the intervall from
E000 to FFFF is not legal in unicode.

----------  Forwarded Message  ----------

Subject: Bug in the HXT: Data.Char.UTF8.decodeOne
Date: Sunday 10 June 2007 07:40
From: PHO <phonohawk at ps.sakura.ne.jp>
To: hxmltoolbox at fh-wedel.de

Hello,

I've found a bug in Data.Char.UTF8.decodeOne that it fails to decode
UTF-8 letters from U+E000 to U+FFFF. Here is the patch:

{
hunk ./src/Data/Char/UTF8.hs 248
-    | b1 < 0xEE   = decodeOne_threebyte bs
+    | b1 < 0xF0   = decodeOne_threebyte bs
}
--------------------------------------------------------------

here is the source
http://darcs.fh-wedel.de/hxt/src/Data/Char/UTF8.hs

Any suggestions?

Uwe Schmidt


More information about the Haskell mailing list