[Haskell-cafe] A very nontrivial parser

Wed Jul 4 16:41:59 EDT 2007

Well, I eventually got it to work correctly... (!)

My goal is to be able to stack multiple parsers one on top of the other 
- but be able to *change* the stack half way through parsing if needed. 
This I eventually succeeded in doing. The external interface is fairly 
simple, but the type signatures are NOT. (!!)

My basic idea was to abstract the data source that a parser gets its 
data from:

  class Source s where
    empty :: s x -> Bool
    fetch :: s x -> (x, s x)

  instance Source [] where   -- Nice syntax... :-S
    empty = null
    fetch xs = (head xs, tail xs)

Now I can define a parser type. But... uh... there's a slight glitch. 
What I *want* to say is

  Parser state in out = ...

But what I ended up with is

  newtype Parser state src x y = Parser ((state, src x) -> (state, src 
x, y))

I then make Parser a monad, write some functions to get/set the state 
parameter, and

  token_get :: (Source src) => Parser state src x x
  token_get = Parser (\(state, tokens) -> let (t,ts) = fetch tokens in 
(state, ts, t))

Anyway, all of that more or less works. Then I begin the utterly 
psychopathic stuff:

  data Stack state0 src0 t0 t1 = ...

  instance Source (Stack state0 src0 t0) where ...

  stacked :: st0 -> Parser st0 src0 t0 [t1] -> st1 -> Parser st1 (Stack 
st0 src0 t0) t1 x -> Parser st9 src0 t0 x

By this point, my brain is in total agony! >_<

But, almost unbelievably, all this psychotic code actually *works*... 
(Well, there were a few bugs, they're fixed now.)

Essentially, I have the "stacked" function, where if I do

  x <- stacked foo parser1 bar parser2
  y <- parser3

then it runs parser2, but it uses parser1 to transform the data first. 
Which is what I actually wanted in the first place... Most critically, 
when parser2 *stops* demanding tokens, parser3 is run, picking up from 
where parser1 left off. (Confused yet? Wait til you see the code to 
implement this insanity!)

One problem remains... That pesky source type. Every time I mention a 
parser, I have to say what kind of course object it reads from - even 
though all parsers work with *any* source object! (That's the whole 
point of the Source class.) I really want to get rid of this. (See, for 
example, the type signature for "stacked". Yuck!) Also, every time I 
write a simple parser, I get a compile-time error saying something about 
a "monomorphism restriction" or something... If I add an explicit type 
it goes away, but it's very annoying to keep typing things like

  test7 :: (Source src) => Parser state src Int Int

and so forth. And I can't help thinking if I could just get *rid* of 
that stupid source type in the signature, there wouldn't be a problem...

Anybody have a solution to this?