[GHC] #8730: Invalid Unicode Codepoints in Char
GHC
ghc-devs at haskell.org
Sun Nov 16 21:38:45 UTC 2014
#8730: Invalid Unicode Codepoints in Char
-------------------------------------+-------------------------------------
Reporter: mdmenzel | Owner: ekmett
Type: bug | Status: new
Priority: low | Milestone:
Component: Core | Version: 7.6.3
Libraries | Keywords: unicode
Resolution: | Architecture: Unknown/Multiple
Operating System: | Difficulty: Unknown
Unknown/Multiple | Blocked By:
Type of failure: | Related Tickets:
None/Unknown |
Test Case: |
Blocking: |
Differential Revisions: |
-------------------------------------+-------------------------------------
Changes (by thomie):
* cc: batterseapower, core-libraries-committee@… (added)
* owner: => ekmett
* component: Compiler => Core Libraries
Comment:
Thank you for the report. I am just adding some references.
{{{
Prelude Data.Char> all ((==) Surrogate . generalCategory) ['\xdc80' ..
'\xdfff']
True
}}}
* http://www.unicode.org/versions/Unicode7.0.0/ch23.pdf
* http://tools.ietf.org/html/rfc3629
* http://en.wikipedia.org/wiki/UTF-8#Invalid_code_points:
>According to the UTF-8 definition (RFC 3629) the high and low surrogate
halves used by UTF-16 (U+D800 through U+DFFF) are not legal Unicode
values, and their UTF-8 encoding should be treated as an invalid byte
sequence.
>Whether an actual application should do this is debatable, as it makes it
impossible to store invalid UTF-16 (that is, UTF-16 with unpaired
surrogate halves) in a UTF-8 string. This is necessary to store unchecked
UTF-16 such as Windows filenames as UTF-8. It is also incompatible with
CESU encoding (described below).
In commit dc58b7398910a433259a6c0f58a0d05a48555191:
{{{
Author: Max Bolingbroke <>
Date: Sat May 14 22:50:46 2011 +0100
Big patch to improve Unicode support in GHC. Validated on OS X and
Windows, this
patch series fixes #5061, #1414, #3309, #3308, #3307, #4006 and #4855.
}}}
This commit adds checks like `... if isSurrogate c then done
InvalidSequence ir ow else do ...` to GHC/IO/Encoding/UTF{8|16|32}.hs
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8730#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list