[Haskell] Bug in the HXT: Data.Char.UTF8.decodeOne
Vidar Larsen
vi_larsen at yahoo.no
Mon Jun 11 07:18:12 EDT 2007
The range from U+E000 to U+F8FF is Private Use, and, thus, in use.
There are also usable ranges from U+F900 up to U+FFFF, and beyond.
The only big invalid range in UTF-8 encoding, is for the codepoints
in the surrogates area: U+D800 to U+DFFF. These are used by UTF-16 to
encode codepoints outside the base plane.
See also http://www.ietf.org/rfc/rfc3629.txt
/vidar
Den 11. jun. 2007 kl. 11:49 skrev Uwe Schmidt:
>
> I've got a bug report concerning the UTF decoding
> in HXT. I've copied the source containing the bug from the
> Haskell Internationalisation Working Group.
> I guess this source is also used in other
> projects, e.g. darcs.
>
> My question: Is this really a bug or is it a feature.
> My knowlege so far was, the intervall from
> E000 to FFFF is not legal in unicode.
>
> ---------- Forwarded Message ----------
>
> Subject: Bug in the HXT: Data.Char.UTF8.decodeOne
> Date: Sunday 10 June 2007 07:40
> From: PHO <phonohawk at ps.sakura.ne.jp>
> To: hxmltoolbox at fh-wedel.de
>
> Hello,
>
> I've found a bug in Data.Char.UTF8.decodeOne that it fails to decode
> UTF-8 letters from U+E000 to U+FFFF. Here is the patch:
>
> {
> hunk ./src/Data/Char/UTF8.hs 248
> - | b1 < 0xEE = decodeOne_threebyte bs
> + | b1 < 0xF0 = decodeOne_threebyte bs
> }
> --------------------------------------------------------------
>
> here is the source
> http://darcs.fh-wedel.de/hxt/src/Data/Char/UTF8.hs
>
> Any suggestions?
>
> Uwe Schmidt
> _______________________________________________
> Haskell mailing list
> Haskell at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell
More information about the Haskell
mailing list