Robustness of instance Read Char

Simon Peyton-Jones
Fri, 2 Nov 2001 08:02:49 -0800

I do agree with you that it woud be better for the Read class
to use a Maybe result rather than a list of parses.  But I'm not
sure your problem can be solved simply by making the Char
instance of Read better.   The point is that the parser has to read
the *whole* string before it can be sure that it is syntactically well
(e.g. no duff escape sequence in it) and hence it can't produce the
string till its sure that it can parse it.  So it gets tummy ache.

Better perhaps to roll your own Read class which produces output
earlier.  For that it would help if I finished up the generics support
in GHC so that you could do something like "deriving Read" for your
own new class.


| -----Original Message-----
| From: Peter Thiemann []=20
| Sent: 15 October 2001 11:45
| To:
| Cc:
| Subject: Robustness of instance Read Char
| Folks,
| my code has unwillingly been forced to read a large string=20
| generated by show. This turned out to be a robustness test=20
| because the effect is a stack overflow (with Hugs as well as=20
| with GHC) and, of course, this error happened in a CGI script.=20
| If you want to try the effect yourself, just take a file=20
| "foo" of, say, 150k and type this into you hungry Hugs prompt:
| readFile "foo" >>=3D \s -> putStr (read (show foo))
| Digging down into the prelude code (taken from Hugs's prelude=20
| file), you find this:=20
| > instance Read Char where
| >   readsPrec p      =3D readParen False
| > 			    (\r -> [(c,t) | ('\'':s,t) <- lex r,
| > 					    (c,"\'")   <-=20
| readLitChar s ])
| >   readList =3D readParen False (\r -> [(l,t) | ('"':s, t) <- lex r,
| > 					       (l,_)      <- readl s ])
| > 	       where readl ('"':s)      =3D [("",s)]
| > 		     readl ('\\':'&':s) =3D readl s
| > 		     readl s            =3D [(c:cs,u) | (c ,t) <-=20
| readLitChar s,
| > 						      (cs,u) <-=20
| readl t ]
| which means that the parser reading this string has the=20
| ability to fail and to backtrack *at every single character*.=20
| While this might be=20
| useful in the general case, it certainly causes our little=20
| one-line program to die.=20
| Unfortunately, in my real program, the String is embedded in=20
| a data type which is deriving Read, so that writing the=20
| specific instance of read is a major pain. Two things would=20
| help me in this situation:
| 1. some kind-hearted maintainer of a particularly=20
| well-behaved Haskell=20
|    implementation might put in a more efficient definition in the
|    instance Read Char (or convince me that backtracking inside of
|    reading a String is a useful gadget). The following code will do:
| readListChar :: String -> [(String, String)]
| readListChar =3D
|   return . readListChar' . dropWhile isSpace
| readListChar' ('\"':rest) =3D
|   readListChar'' rest
| readListChar'' ('\"':rest) =3D
|   ("",rest)
| readListChar'' rest =3D=20
|   let (c, s') =3D head (readLitChar rest)=20
|       (s, s'') =3D readListChar'' s'
|   in  (c:s, s'')
| {- clearly, taking the head should be guarded and a proper=20
| error message generated -}
| 2. provide a way of locally replacing the offending instance of Read
|    with something else. [urgh, a language extension]
| Any suggestions or comments?
| -Peter
| _______________________________________________
| Haskell mailing list