[Haskell] Bug in the HXT: Data.Char.UTF8.decodeOne

Mon Jun 11 07:18:12 EDT 2007

The range from U+E000 to U+F8FF is Private Use, and, thus, in use.  
There are also usable ranges from U+F900 up to U+FFFF, and beyond.

The only big invalid range in UTF-8 encoding, is for the codepoints  
in the surrogates area: U+D800 to U+DFFF. These are used by UTF-16 to  
encode codepoints outside the base plane.

See also http://www.ietf.org/rfc/rfc3629.txt

/vidar

Den 11. jun. 2007 kl. 11:49 skrev Uwe Schmidt:

>
> I've got a bug report concerning the UTF decoding
> in HXT. I've copied the source containing the bug from the
> Haskell Internationalisation Working Group.
> I guess this source is also used in other
> projects, e.g. darcs.
>
> My question: Is this really a bug or is it a feature.
> My knowlege so far was, the intervall from
> E000 to FFFF is not legal in unicode.
>
> ----------  Forwarded Message  ----------
>
> Subject: Bug in the HXT: Data.Char.UTF8.decodeOne
> Date: Sunday 10 June 2007 07:40
> From: PHO <phonohawk at ps.sakura.ne.jp>
> To: hxmltoolbox at fh-wedel.de
>
> Hello,
>
> I've found a bug in Data.Char.UTF8.decodeOne that it fails to decode
> UTF-8 letters from U+E000 to U+FFFF. Here is the patch:
>
> {
> hunk ./src/Data/Char/UTF8.hs 248
> -    | b1 < 0xEE   = decodeOne_threebyte bs
> +    | b1 < 0xF0   = decodeOne_threebyte bs
> }
> --------------------------------------------------------------
>
> here is the source
> http://darcs.fh-wedel.de/hxt/src/Data/Char/UTF8.hs
>
> Any suggestions?
>
> Uwe Schmidt
> _______________________________________________
> Haskell mailing list
> Haskell at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell