Robustness of instance Read Char

Peter Thiemann thiemann@informatik.uni-freiburg.de
15 Oct 2001 12:45:16 +0200


Folks,

my code has unwillingly been forced to read a large string generated
by show. This turned out to be a robustness test because the effect is
a stack overflow (with Hugs as well as with GHC) and, of course, this
error happened in a CGI script. 

If you want to try the effect yourself, just take a file "foo" of,
say, 150k and type this into you hungry Hugs prompt:

readFile "foo" >>= \s -> putStr (read (show foo))

Digging down into the prelude code (taken from Hugs's prelude file),
you find this: 

> instance Read Char where
>   readsPrec p      = readParen False
> 			    (\r -> [(c,t) | ('\'':s,t) <- lex r,
> 					    (c,"\'")   <- readLitChar s ])
>   readList = readParen False (\r -> [(l,t) | ('"':s, t) <- lex r,
> 					       (l,_)      <- readl s ])
> 	       where readl ('"':s)      = [("",s)]
> 		     readl ('\\':'&':s) = readl s
> 		     readl s            = [(c:cs,u) | (c ,t) <- readLitChar s,
> 						      (cs,u) <- readl t ]

which means that the parser reading this string has the ability to
fail and to backtrack *at every single character*. While this might be 
useful in the general case, it certainly causes our little one-line
program to die. 

Unfortunately, in my real program, the String is embedded in a data
type which is deriving Read, so that writing the specific instance of
read is a major pain. Two things would help me in this situation:

1. some kind-hearted maintainer of a particularly well-behaved Haskell 
   implementation might put in a more efficient definition in the
   instance Read Char (or convince me that backtracking inside of
   reading a String is a useful gadget). The following code will do:

readListChar :: String -> [(String, String)]
readListChar =
  return . readListChar' . dropWhile isSpace

readListChar' ('\"':rest) =
  readListChar'' rest

readListChar'' ('\"':rest) =
  ("",rest)
readListChar'' rest = 
  let (c, s') = head (readLitChar rest) 
      (s, s'') = readListChar'' s'
  in  (c:s, s'')

{- clearly, taking the head should be guarded and a proper error
message generated -}

2. provide a way of locally replacing the offending instance of Read
   with something else. [urgh, a language extension]

Any suggestions or comments?
-Peter