[commit: ghc] master: Update encoding001 to test the full range of non-surrogate code points (e78841b)

Thu Jul 23 12:55:06 UTC 2015

Repository : ssh://git@git.haskell.org/ghc

On branch  : master
Link       : http://ghc.haskell.org/trac/ghc/changeset/e78841b518ee9c0b92437899c3a4a2307dfd4ac8/ghc

>---------------------------------------------------------------

commit e78841b518ee9c0b92437899c3a4a2307dfd4ac8
Author: Reid Barton <rwbarton at gmail.com>
Date:   Thu Jul 23 11:43:07 2015 +0200

    Update encoding001 to test the full range of non-surrogate code points
    
    GHC has used surrogate code points for roundtripping since 7.4.
    See Note [Roundtripping].
    
    Also, improve the wording of that Note slightly.
    
    Test Plan: validate still passes
    
    Reviewers: austin, hvr, bgamari
    
    Reviewed By: bgamari
    
    Subscribers: thomie
    
    Differential Revision: https://phabricator.haskell.org/D1087


>---------------------------------------------------------------

e78841b518ee9c0b92437899c3a4a2307dfd4ac8
 libraries/base/GHC/IO/Encoding/Failure.hs | 9 +++++----
 libraries/base/tests/IO/encoding001.hs    | 9 +--------
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/libraries/base/GHC/IO/Encoding/Failure.hs b/libraries/base/GHC/IO/Encoding/Failure.hs
index df5a992..3f9360d 100644
--- a/libraries/base/GHC/IO/Encoding/Failure.hs
+++ b/libraries/base/GHC/IO/Encoding/Failure.hs
@@ -74,21 +74,22 @@ data CodingFailureMode
 -- unicode input that includes lone surrogate codepoints is invalid by
 -- definition.
 --
+--
 -- When we used private-use characters there was a technical problem when it
 -- came to encoding back to bytes using iconv. The iconv code will not fail when
 -- it tries to encode a private-use character (as it would if trying to encode
--- a surrogate), which means that we won't get a chance to replace it
+-- a surrogate), which means that we wouldn't get a chance to replace it
 -- with the byte we originally escaped.
 --
 -- To work around this, when filling the buffer to be encoded (in
 -- writeBlocks/withEncodedCString/newEncodedCString), we replaced the
 -- private-use characters with lone surrogates again! Likewise, when
--- reading from a buffer (unpack/unpack_nl/peekEncodedCString) we have
+-- reading from a buffer (unpack/unpack_nl/peekEncodedCString) we had
 -- to do the inverse process.
 --
 -- The user of String would never see these lone surrogates, but it
--- ensures that iconv will throw an error when encountering them.  We
--- use lone surrogates in the range 0xDC00 to 0xDCFF for this purpose.
+-- ensured that iconv will throw an error when encountering them.  We
+-- used lone surrogates in the range 0xDC00 to 0xDCFF for this purpose.
 
 codingFailureModeSuffix :: CodingFailureMode -> String
 codingFailureModeSuffix ErrorOnCodingFailure       = ""
diff --git a/libraries/base/tests/IO/encoding001.hs b/libraries/base/tests/IO/encoding001.hs
index 9480abb..c92f8a3 100644
--- a/libraries/base/tests/IO/encoding001.hs
+++ b/libraries/base/tests/IO/encoding001.hs
@@ -29,14 +29,7 @@ main = do
           chr (fromIntegral (x `shiftR` 8)  .&. 0xff),
           chr (fromIntegral x .&. 0xff) ]
   hPutStr h (concatMap expand32 [ 0, 32 .. 0xD7ff ])
-  -- We avoid the private-use characters at 0xEF00..0xEFFF
-  -- that reserved for GHC's PEP383 roundtripping implementation.
-  --
-  -- The reason is that currently normal text containing those
-  -- characters will be mangled, even if we aren't using an encoding
-  -- created using //ROUNDTRIP.
-  hPutStr h (concatMap expand32 [ 0xE000, 0xE000+32 .. 0xEEFF ])
-  hPutStr h (concatMap expand32 [ 0xF000, 0xF000+32 .. 0x10FFFF ])
+  hPutStr h (concatMap expand32 [ 0xE000, 0xE000+32 .. 0x10FFFF ])
   hClose h
 
   -- convert the UTF-32BE file into each other encoding