[RFC] Support Unicode characters in instance Show String

Thu Jul 8 18:41:41 UTC 2021

On Thu, Jul 08, 2021 at 06:53:38PM +0300, Oleg Grenrus wrote:

> An example failure is:
> 
>     3587.md
>       #1:                                                           
> FAIL (0.01s)
>         --- test/command/3587.md
>         +++ pandoc -f latex -t native
>         +   1 [Para [Str "1 m",Space,Str "is",Space,Str "equal",Space,Str "to",Space,Str "1000 mm"]]
>         -   1 [Para [Str "1\160m",Space,Str "is",Space,Str "equal",Space,Str "to",Space,Str "1000\160mm"]]
> 
> Str is a constructor of Inline type, and takes Text: data Inline = Str Text | ...
> As discussed on the GHC issue [1], Text and ByteString Show Instances piggyback on
> String instance. Bodigrim said that Text will eventually migrate
> to do the same as new Show String [2], so this issue will resurface.
> 
> Please explain the compatibility story. How library writes should write
> their code (in test-suites) which rely on Show String or Show Text, such
> that they could support GHC base versions (and/or text) versions
> on the both sides of this breaking change.

One possible approach is to apply "show @String . read @String" to
normalise expected literal string or text fragments.  This is easiest
when the expected result is *code* in the test, since the transformation
can be applied to the code that's encapsulates the expected value.

When  there are test *files* that hold the "expected output" of `show`
for particular inputs, clearly if `show` changes there can't be a single
fixed file that determines the success of the test case.  So for
reproducible tests, one might have to generate the "expected output"
file, by normalising appropropriate fragments as above.

Likely a QuasiQuoter can be defined to simplify the task.

-- 
    Viktor.