default instance for IsString

Wed Apr 25 04:11:39 CEST 2012

On 4/24/12 3:35 PM, Markus Läll wrote:
> For what I understand, and putting words in his mouth, he wants to
> write `"<something=illegal>" :: XML' and have the compiler tell him at
> compile-time that this is not valid XML (if it actually is, imagine
> that there's something invalid between the double quotes). I.e he
> wants to parse the string at compile-time and have the compilation
> fail if the parse fails, or have the string literal be replaced by the
> syntax tree of that XML if it succeeds.*
>
> This example is meta-programming par excellence, which is what
> Template Haskell is for -- use it.

Indeed. Asking that "illegal" string literals be caught at compile time 
is, in effect, updating the syntax of Haskell itself. As it stands, 
Haskell has a definition of what a string literal is (see the Report), 
and whether or not that literal can be successfully coerced into a given 
type is neither here nor there; just as for numeric literals.

I'm all for static-checking. (Even moreso with every passing year.) But 
if you want to make up new sorts of literals and have them checked for 
validity, that's exactly what quasiquotes are there for. Since you are 
altering the syntax of Haskell, rather than accepting what Haskell calls 
strings, then this is metaprogramming and so you're going to need TH, 
QQ, or some similar metaprogramming facility. Whereas for ByteString and 
Text the goal is specifically to serve as an efficient/correct 
replacement for String; thus, overloading string literals to support 
those types is _not_ asking to change the syntax of Haskell.

To the extent that ByteString's instance runs into issues with high 
point codes, that strikes me as a bug in virtue of poor foresight. 
Consider, for instance, the distinction between integral and 
non-integral numeric literals. We recognize that (0.1 :: Int) is 
invalid, and so we a-priori define the Haskell syntax to recognize two 
different sorts of "numbers". It seems that we should do the same thing 
for strings. 'String' literals of raw binary goop (subject to escape 
mechanisms for detecting the end of string) are different from string 
literals which are valid Unicode sequences. This, I think, is fair game 
to be expressed directly in the specification of overloaded string 
literals, just as we distinguish classes of overloaded numeric literals. 
Unfortunately, for numeric literals we have a nice syntactic distinction 
between integral and non-integral, which seems to suggest that we'd need 
a similar syntactic distinction to recognize the different sorts of 
string literals.

-- 
Live well,
~wren