<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>It would also be good to have a summary of of previous discussion

      OP<br>

      kindly linked [1], e.g. the comment by David Turner [2]<br>

      <br>

      > One of the most visible uses of Show is that it's how values

      are shown in<br>

      > GHCi. As mentioned earlier in this thread, if you're teaching

      in a<br>

      > non-ASCII language then the user experience is pretty poor.<br>

      > <br>

      > On the other hand, I see Show (like .ToString() in C# etc.)

      as a debugging<br>

      > tool: not for seriously robust serialisation but useful if

      you need to dump<br>

      > a value into a log message or email or similar. And in that

      situation it's<br>

      > very useful if it sticks to ASCII: non-ASCII content just

      isn't resilient<br>

      > enough to being passed around the network, truncated and

      generally<br>

      > mutilated on the way through.<br>

      > <br>

      > These are definitely two different concerns and they pull in

      opposite<br>

      > directions in this discussion. It's a matter of opinion which

      you think is<br>

      > more important. Me, I think the latter, but then I do a lot

      of logging and<br>

      > speak a language that fits into  ASCII. YMMV!<br>

      <br>

      This proposal is motivated by the first point, but doesn't mention

      debugging<br>

      other then<br>

      <br>

      > 2. This is an actual annoyance during debugging localized

      software, or<br>

           strings with emojis<br>

      <br>

      which I don't agree with.<br>

      <br>

      For example look at the failing test case in the pandoc in my

      previous message.<br>

      \160 is a non-breaking space, which looks like normal space when

      rendered<br>

      normally. I have my share of bad experience with it. So, indeed

      YMMV.<br>

      <br>

      - Oleg<br>

      <br>

      [1]:

<a class="moz-txt-link-freetext" href="https://mail.haskell.org/pipermail/haskell-cafe/2016-February/122874.html">https://mail.haskell.org/pipermail/haskell-cafe/2016-February/122874.html</a><br>

      [2]:

<a class="moz-txt-link-freetext" href="https://mail.haskell.org/pipermail/haskell-cafe/2016-February/122899.html">https://mail.haskell.org/pipermail/haskell-cafe/2016-February/122899.html</a><br>

      <br>

    </p>

    <div class="moz-cite-prefix">On 8.7.2021 18.53, Oleg Grenrus wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:0f67eb49-f28d-af20-2754-e29e945b6c23@iki.fi">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <p>Here is a simple patch, which I hope is close to what<br>

        <br>

        1. Modify a few guards in GHC.Show.showLitChar to not escape

        _readable_<br>

           Unicode characters out of the range of ASCII.<br>

        <br>

        of a proposed change will look like:<br>

        <br>

            diff --git a/libraries/base/GHC/Show.hs

        b/libraries/base/GHC/Show.hs<br>

            index 84077e473b..24569168d4 100644<br>

            --- a/libraries/base/GHC/Show.hs<br>

            +++ b/libraries/base/GHC/Show.hs<br>

            @@ -364,7 +364,10 @@ showCommaSpace = showString ", "<br>

             -- > showLitChar '\n' s  =  "\\n" ++ s<br>

             --<br>

             showLitChar                :: Char -> ShowS<br>

            -showLitChar c s | c > '\DEL' =  showChar '\\'

        (protectEsc isDec (shows (ord c)) s)<br>

            +showLitChar c s | c > '\DEL' =<br>

            +    if isPrint c<br>

            +    then showChar c s<br>

            +    else  showChar '\\' (protectEsc isDec (shows (ord c))

        s)<br>

             showLitChar '\DEL'         s =  showString "\\DEL" s<br>

             showLitChar '\\'           s =  showString "\\\\" s<br>

             showLitChar c s | c >= ' '   =  showChar c s<br>

            @@ -380,6 +383,13 @@ showLitChar c              s = 

        showString ('\\' : asciiTab!!ord c) s<br>

                     -- I've done manual eta-expansion here, because

        otherwise it's<br>

                     -- impossible to stop (asciiTab!!ord) getting

        floated out as an MFE<br>

             <br>

            +-- Local definition of isPrint to avoid fighting with

        cycles for now.<br>

            +isPrint                 :: Char -> Bool<br>

            +isPrint    c = iswprint (ord c) /= 0<br>

            +<br>

            +foreign import ccall unsafe "u_iswprint"<br>

            +  iswprint :: Int -> Int<br>

            +<br>

             showLitString :: String -> ShowS<br>

             -- | Same as 'showLitChar', but for strings<br>

             -- It converts the string to a string using Haskell escape

        conventions<br>

        <br>

        I applied it to ghc-8.10 branch,<br>

        <br>

            % _build/stage1/bin/ghc --interactive<br>

            GHCi, version 8.10.5: <a class="moz-txt-link-freetext"

          href="https://www.haskell.org/ghc/" moz-do-not-send="true">https://www.haskell.org/ghc/</a> 

        :? for help<br>

            Prelude> "äiti"<br>

            "äiti"<br>

            Prelude> "мир"<br>

            "мир"<br>

            Prelude> print "мир"<br>

            "мир"<br>

            Prelude> "😀"<br>

            "😀"<br>

        <br>

        And then run test-suites of aeson, dhall and pandoc.<br>

        <br>

        Aeson test-suite passed.<br>

        Dhall test-suites passed too,<br>

        However pandoc testsuite failed:<br>

        <br>

        78 out of 2819 tests failed (35.88s)<br>

        <br>

        An example failure is:<br>

        <br>

            3587.md<br>

        #1:                                                           

        FAIL (0.01s)<br>

                --- test/command/3587.md<br>

                +++ pandoc -f latex -t native<br>

                +   1 [Para [Str "1 m",Space,Str "is",Space,Str

        "equal",Space,Str "to",Space,Str "1000 mm"]]<br>

                -   1 [Para [Str "1\160m",Space,Str "is",Space,Str

        "equal",Space,Str "to",Space,Str "1000\160mm"]]<br>

        <br>

        Str is a constructor of Inline type, and takes Text: data Inline

        = Str Text | ...<br>

        As discussed on the GHC issue [1], Text and ByteString Show

        Instances piggyback on <br>

        String instance. Bodigrim said that Text will eventually migrate<br>

        to do the same as new Show String [2], so this issue will

        resurface.<br>

        <br>

        Please explain the compatibility story. How library writes

        should write<br>

        their code (in test-suites) which rely on Show String or Show

        Text, such<br>

        that they could support GHC base versions (and/or text) versions<br>

        on the both sides of this breaking change.<br>

        <br>

        I agree with Julian that required migration engineering effort

        across<br>

        (even just the open source) ecosystem is non-trivial.<br>

        Having a good plan would hopefully make it easier to accept that

        cost.<br>

        <br>

        The fact it's a change which is not detectable at compile time<br>

        makes me very anxious about this, even I don't disagree with

        motivation bits.<br>

        I have very little idea if and where I depend on Show String

        behavior.<br>

        <br>

        It would also be interesting to see results of test-suites of

        all Stackage, but I leave it for someone else to do.<br>

        <br>

        - Oleg<br>

        <br>

        [1]: <a class="moz-txt-link-freetext"

          href="https://gitlab.haskell.org/ghc/ghc/-/issues/20027"

          moz-do-not-send="true">https://gitlab.haskell.org/ghc/ghc/-/issues/20027</a><br>

        [2]: <a class="moz-txt-link-freetext"

          href="https://gitlab.haskell.org/ghc/ghc/-/issues/20027#note_363519"

          moz-do-not-send="true">https://gitlab.haskell.org/ghc/ghc/-/issues/20027#note_363519</a><br>

      </p>

      <div class="moz-cite-prefix">On 8.7.2021 15.25, Julian Ospald

        wrote:<br>

      </div>

      <blockquote type="cite"

        cite="mid:E738A855-636D-4ABB-8FA3-EA070BB4A836@posteo.de">

        <meta http-equiv="content-type" content="text/html;

          charset=UTF-8">

        Hi,<br>

        <br>

        I think most seemed to agree on the motivation, but would it be

        a lot of work to ping a few large opensource/industry projects

        about this and get a feel what they think or how much of an

        expected effort a migration would be? I'm afraid that we might

        take this too lightly and possibly cause a lot of engineering

        effort here. Our expectations how or how often people use "show"

        might or might not be accurate.<br>

        <br>

        I'm aware of e.g. the cardano wallet test suite (open source)

        and other cardano projects that are very large opon source

        codebases and may be affected. <br>

        <br>

        CCing duncan<br>

        <br>

        <div class="gmail_quote">On July 8, 2021 10:11:28 AM UTC, Kai Ma

          <a class="moz-txt-link-rfc2396E"

            href="mailto:justksqsf@gmail.com" moz-do-not-send="true"><justksqsf@gmail.com></a>

          wrote:

          <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt

            0.8ex; border-left: 1px solid rgb(204, 204, 204);

            padding-left: 1ex;">

            <pre class="k9mail">Hi all

Two weeks ago, I proposed “Support Unicode characters in instance Show

String” [0] in the GHC issue tracker, and chessai asked me to post it

here for wider feedback.  The proposal posted here is edited to reflect

new ideas proposed and insights accumulated over the days:

1. (Proposal) Now the proposal itself is now modeled after Python.

2. (Alternative Options) Alternative 2 is the original proposal.

3. (Downsides) New.  About breakage.

4. (Prior Art) New.

5. (Unresolved Problems) New.  Included for discussion.

Even though I wanted to summarize everything here, some insightful

comments are perhaps not included or misunderstood.  These original

comments can be found at the original feature request.

[0] <a href="https://gitlab.haskell.org/ghc/ghc/-/issues/20027" moz-do-not-send="true">https://gitlab.haskell.org/ghc/ghc/-/issues/20027</a>

Motivation<hr>Unicode has been widely adopted and people around the world rely on

Unicode to write in their native languages. Haskell, however, has been

stuck in ASCII, and escape all non-ASCII characters in the String's

instance of the Showclass, despite the fact that each element of a

String is typically a Unicode code point, and putStrLn actually works as

expected. Consider the following examples:

    ghci> print "Hello, 世界”

    "Hello, \19990\30028”

    ghci> print "Hello, мир”

    "Hello, \1084\1080\1088”

    ghci> print "Hello, κόσμος”

    "Hello, \954\972\963\956\959\962”

    ghci> "Hello, 世界"       -- ghci calls `show`, so string literals are also escaped

    "Hello, \19990\30028”

    ghci> "😀"  -- Not only human scripts, but also emojis!

    "\128512”

This status quo is unsatisfactory for a number of reasons:

1. Even though it's small, it somehow creates an unwelcoming atmosphere

   for native speakers of languages whose scripts are not representable

   in ASCII.

2. This is an actual annoyance during debugging localized software, or

   strings with emojis.

3. Following 1, Haskell teachers are forced to use other languages

   instead of the students' mother tongues, or relying on I/O functions

   like putStrLn, creating a rather unnecessary burden.

4. Other string types, like Text [1], rely on this Show instance.

Moreover, `read` already can handle Unicode strings today, so relaxing

constraints on `show` doesn't affect `read . show == id`.

Proposal<hr>It's proposed here to change the Show instance of String, to achieve the following output:

    ghci> print "Hello, 世界”

    "Hello, 世界”

    ghci> print "Hello, мир”

    "Hello, мир”

    ghci> print "Hello, κόσμος”

    "Hello, κόσμος”

    ghci> "Hello, 世界”      

    “Hello, 世界”

    ghci> "😀” 

    “😀"

More concretely, it means:

1. Modify a few guards in GHC.Show.showLitChar to not escape _readable_

   Unicode characters out of the range of ASCII.

2. Provide a function showEscaped or newtype Escaped = Escaped String to

   obtain the current escaping behavior, in case anyone wants the

   current behavior back.

This proposal isn't about unescaping everything, but only readable

Unicode characters.  u_iswprint (GHC.Unicode.isPrint) seems to do the

job, and indeed, there was a similar proposal before [2].  In summary,

the behavior is similar to what Python `repr` does.

Alternative Options<hr>1. Always use putStrLn.

   This is viable today but unsatisfactory as it requires stdout.  In

   some cases, stdout is not accessible, e.g. Telegram or Discord bots.

2. Don't escape anything.

   `show` itself refrains from escaping most of the characters, and let

   ghci do the job instead.

3. Customize ghci instead.

   ghci intercepts output strings and check if they can be converted

   back to readable characters.  This potentially allows for better

   compatibility with a variety of strangely behaving terminals, and

   finer-grained user control.

   Tom Ellis proposed `-interactive-print`-based solutions in the

   comment section.

4. A new language extension, e.g. ShowStringUnicode.

   Proposed by Julian Ospald.  When enabled, readable Unicode characters

   are not escaped, and this is enabled by default by ghci.  There are

   concerns about how this would affect cross-module behavior.

Downsides<hr>This is definitely a breaking change, but the breakage, to our current

understanding, is limited.

First, use of `show` in production code is discouraged.  Even if someone

really does that, the breakage only happens when one tries to send the

"serialized" data over wire:

Suppose Machine A `show`-ed a string and saved it into a UTF-8-encoded

file, and sends it to Machine B, which expects another encoding.  This

would be surprising for those who are used to the old behavior.

Second, though the breakage is not likely to be catastrophic for correct

production code, test suites could be badly affected, as pointed out by

Oleg Grenrus and vdukhovni in the comment section.  Some test suites

compare `show` results with expected results.  vdukhovni further

commented that Haskell escapes are not universally supported by

non-Haskell tools, so the impact would be confined to Haskell.

Prior Art<hr>Python supports Unicode natively since 3.  Python's approach is

intuitive and capable.  Its `repr`, which is equivalent to Haskell's

`show`, automatically escapes unreadable characters, but leaves readable

characters unescaped.  The criteria of "readable" can be found in

CPython's code [3].  If we were to realize this proposal, Python could

be a source of inspiration.

Unresolved Problems<hr>There are some currently unresolved (not discussed enough) issues.

+ Locales.

  What if the specified locale does not support Unicode?  Hécate

  Moonlight pointed out PEP-538 [4] could be a reference.

+ Unicode versions.

  Javran Cheng pointed out u_iswprint is generated from a Unicode table,

  which is manually updated.  This raises a concern that the definition

  of "printable" characters could change from version to version.

+ Definition of "readable".

  Unicode already defined "printability".  It's good, but it is not

  necessarily what we want here.

  - Should we support RTL?

  - Should we design a Haskell-specific definition of readability, to

    avoid Unciode version silently introducing breakage?

(More?)

Some issues here perhaps require better answers to: What is our

expectation of Show?  Where should it be used?  Should we expect it to

break on every Unicode update?

[1] <a href="https://hackage.haskell.org/package/text-1.2.4.1/docs/src/Data.Text.Show.html#line-37" moz-do-not-send="true">https://hackage.haskell.org/package/text-1.2.4.1/docs/src/Data.Text.Show.html#line-37</a>

[2] <a href="https://mail.haskell.org/pipermail/haskell-cafe/2016-February/122874.html" moz-do-not-send="true">https://mail.haskell.org/pipermail/haskell-cafe/2016-February/122874.html</a>

[3] <a href="https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Objects/unicodectype.c#L147" moz-do-not-send="true">https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Objects/unicodectype.c#L147</a>

[4] <a href="https://www.python.org/dev/peps/pep-0538/" moz-do-not-send="true">https://www.python.org/dev/peps/pep-0538/</a><hr>Libraries mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Libraries@haskell.org" moz-do-not-send="true">Libraries@haskell.org</a>

<a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries" moz-do-not-send="true">http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries</a>

</pre>

          </blockquote>

        </div>

        <br>

        <fieldset class="mimeAttachmentHeader"></fieldset>

        <pre class="moz-quote-pre" wrap="">_______________________________________________

Libraries mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Libraries@haskell.org" moz-do-not-send="true">Libraries@haskell.org</a>

<a class="moz-txt-link-freetext" href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries" moz-do-not-send="true">http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries</a>

</pre>

      </blockquote>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

Libraries mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Libraries@haskell.org">Libraries@haskell.org</a>

<a class="moz-txt-link-freetext" href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries">http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries</a>

</pre>

    </blockquote>

  </body>

</html>