[commit: ghc] master: Update encoding001 to test the full range of non-surrogate code points (e78841b)
git at git.haskell.org
git at git.haskell.org
Thu Jul 23 12:55:06 UTC 2015
Repository : ssh://git@git.haskell.org/ghc
On branch : master
Link : http://ghc.haskell.org/trac/ghc/changeset/e78841b518ee9c0b92437899c3a4a2307dfd4ac8/ghc
>---------------------------------------------------------------
commit e78841b518ee9c0b92437899c3a4a2307dfd4ac8
Author: Reid Barton <rwbarton at gmail.com>
Date: Thu Jul 23 11:43:07 2015 +0200
Update encoding001 to test the full range of non-surrogate code points
GHC has used surrogate code points for roundtripping since 7.4.
See Note [Roundtripping].
Also, improve the wording of that Note slightly.
Test Plan: validate still passes
Reviewers: austin, hvr, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1087
>---------------------------------------------------------------
e78841b518ee9c0b92437899c3a4a2307dfd4ac8
libraries/base/GHC/IO/Encoding/Failure.hs | 9 +++++----
libraries/base/tests/IO/encoding001.hs | 9 +--------
2 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/libraries/base/GHC/IO/Encoding/Failure.hs b/libraries/base/GHC/IO/Encoding/Failure.hs
index df5a992..3f9360d 100644
--- a/libraries/base/GHC/IO/Encoding/Failure.hs
+++ b/libraries/base/GHC/IO/Encoding/Failure.hs
@@ -74,21 +74,22 @@ data CodingFailureMode
-- unicode input that includes lone surrogate codepoints is invalid by
-- definition.
--
+--
-- When we used private-use characters there was a technical problem when it
-- came to encoding back to bytes using iconv. The iconv code will not fail when
-- it tries to encode a private-use character (as it would if trying to encode
--- a surrogate), which means that we won't get a chance to replace it
+-- a surrogate), which means that we wouldn't get a chance to replace it
-- with the byte we originally escaped.
--
-- To work around this, when filling the buffer to be encoded (in
-- writeBlocks/withEncodedCString/newEncodedCString), we replaced the
-- private-use characters with lone surrogates again! Likewise, when
--- reading from a buffer (unpack/unpack_nl/peekEncodedCString) we have
+-- reading from a buffer (unpack/unpack_nl/peekEncodedCString) we had
-- to do the inverse process.
--
-- The user of String would never see these lone surrogates, but it
--- ensures that iconv will throw an error when encountering them. We
--- use lone surrogates in the range 0xDC00 to 0xDCFF for this purpose.
+-- ensured that iconv will throw an error when encountering them. We
+-- used lone surrogates in the range 0xDC00 to 0xDCFF for this purpose.
codingFailureModeSuffix :: CodingFailureMode -> String
codingFailureModeSuffix ErrorOnCodingFailure = ""
diff --git a/libraries/base/tests/IO/encoding001.hs b/libraries/base/tests/IO/encoding001.hs
index 9480abb..c92f8a3 100644
--- a/libraries/base/tests/IO/encoding001.hs
+++ b/libraries/base/tests/IO/encoding001.hs
@@ -29,14 +29,7 @@ main = do
chr (fromIntegral (x `shiftR` 8) .&. 0xff),
chr (fromIntegral x .&. 0xff) ]
hPutStr h (concatMap expand32 [ 0, 32 .. 0xD7ff ])
- -- We avoid the private-use characters at 0xEF00..0xEFFF
- -- that reserved for GHC's PEP383 roundtripping implementation.
- --
- -- The reason is that currently normal text containing those
- -- characters will be mangled, even if we aren't using an encoding
- -- created using //ROUNDTRIP.
- hPutStr h (concatMap expand32 [ 0xE000, 0xE000+32 .. 0xEEFF ])
- hPutStr h (concatMap expand32 [ 0xF000, 0xF000+32 .. 0x10FFFF ])
+ hPutStr h (concatMap expand32 [ 0xE000, 0xE000+32 .. 0x10FFFF ])
hClose h
-- convert the UTF-32BE file into each other encoding
More information about the ghc-commits
mailing list