[Haskell-cafe] Re: Lazy Parsing
Gü?nther Schmidt
gue.schmidt at web.de
Sun May 31 08:38:00 EDT 2009
Dear Malcom,
thanks for helping.
I had actually come to Haskell originally because of a parsing problem.
I had been using Smalltalk until I started a project which required
parsing files. Until then I had not done any RW parsing.
Well the route was more a Parsec -> Haskell, wtf is Haskell? Anyway
eventually I dropped Smalltalk and got addicted to Haskell. And managed
familiarize myself with Haskell and Parsec, the latter as it turned
out I didn't even need to solve my original problem.
Anyway polyparse certainly is an option, but there are a few things that
despite my "list of failures" to use it give uu-parsinglib a special
appeal, the breadth-first approach with choice, I find that terrible
elegant. Due to some kicks in my behind it seems that I might be able to
use Doaitse's combinators now, some more details on that are in another
post.
Günther
Malcolm Wallace schrieb:
>> It is my pleasure to announce that after 5 days of experimenting with
>> uu-parsinglib I have absolutely no clue, whatsoever, on how to use it.
>>
>> I do not even manage to write a parser for even a mere digit or a
>> simple character.
>
> I don't know whether you will be willing to change over to polyparse
> library, but here are some hints about how you might use it.
>
> Given that you want the input to be a simple character stream, rather
> than use a more elaborate lexer, the first thing to do is to specialise
> the parser type for your purposes:
>
> > type TextParser a = Parser Char a
>
> Now, to recognise a "mere digit",
>
> > digit :: TextParser Char
> > digit = satisfy Char.isDigit
>
> and for a sequence of digits forming an unsigned integer:
>
> > integer :: TextParser Integer
> > integer = do ds <- many1 digit
> > return (foldl1 (\n d-> n*10+d)
> > (map (fromIntegral.digitToInt) ds))
> > `adjustErr` (++("expected one or more digits"))
>
>> I mean I'd like to be able to turn "12.05.2009" into something like
>> (12, 5, 2009) and got no clue what the code would have to look like. I
>> do know almost every variation what the code must not look like :).
>
> > date = do a <- integer
> > satisfy (=='.')
> > b <- integer
> > satisfy (=='.')
> > c <- integer
> > return (a,b,c)
>
> Of course, that is just the standard (strict) monadic interface used by
> many combinator libraries. Your original desire was for lazy parsing,
> and to achieve that, you must move over to the applicative interface.
> The key difference is that you cannot name intermediate values, but must
> construct larger values directly from smaller ones by something like
> function application.
>
> > lazydate = return (,,) `apply` integer `discard` dot
> > `apply` integer `discard` dot
> > `apply` integer
> > where dot = satisfy (=='.')
>
> The (,,) is the constructor function for triples. The `discard`
> combinator ensures that its second argument parses OK, but throws away
> its result, keeping only the result of its first argument.
>
> Apart from lazy space behaviour, the main observable difference between
> "date" and "lazydate" is when errors are reported on incorrect input.
> For instance:
>
> > fst $ runParser date "12.05..2009"
> *** Exception: In a sequence:
> Parse.satisfy: failed
> expected one or more digits
>
> > fst $ runParser lazydate "12.05..2009"
> (12,5,*** Exception: In a sequence:
> Parse.satisfy: failed
> expected one or more digits
>
> Notice how the lazy parser managed to build the first two elements of
> the triple, whilst the strict parser gave no value at all.
>
> I know that the error messages shown here are not entirely satisfactory,
> but they can be improved significantly just by making greater use of the
> `adjustErr` combinator in lots more places (it is rather like Parsec's
> <?>). Errors containing positional information about the input can be
> constructed by introducing a separate lexical tokenizer, which is also
> not difficult.
>
> Regards,
> Malcolm
More information about the Haskell-Cafe
mailing list