[Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)
Tom Hawkins
tomahawkins at gmail.com
Tue Apr 27 07:23:10 EDT 2010
I had been using Parsec to parse VCD files, but needed to lazily parse
streaming data. After stumbling on this thread below, I switch to
polyparse.
What a great library! I was able to migrate from a strict to a
semi-lazy parser and many of my parse reductions didn't even need to
change. Thanks Malcolm!
In addition to lazy VCD parsing, this version of vcd [1] also includes
step', which forces a step regardless if variables have changed or not
-- helpful for realtime simulation.
(BTW, parsec is a great library too.)
-Tom
[1] http://hackage.haskell.org/package/vcd-0.1.4
On Sun, May 31, 2009 at 6:41 AM, Malcolm Wallace
<malcolm.wallace at cs.york.ac.uk> wrote:
>
> I don't know whether you will be willing to change over to polyparse
> library, but here are some hints about how you might use it.
>
> Given that you want the input to be a simple character stream, rather than
> use a more elaborate lexer, the first thing to do is to specialise the
> parser type for your purposes:
>
>> type TextParser a = Parser Char a
>
> Now, to recognise a "mere digit",
>
>> digit :: TextParser Char
>> digit = satisfy Char.isDigit
>
> and for a sequence of digits forming an unsigned integer:
>
>> integer :: TextParser Integer
>> integer = do ds <- many1 digit
>> return (foldl1 (\n d-> n*10+d)
>> (map (fromIntegral.digitToInt) ds))
>> `adjustErr` (++("expected one or more digits"))
>
>> I mean I'd like to be able to turn "12.05.2009" into something like (12,
>> 5, 2009) and got no clue what the code would have to look like. I do know
>> almost every variation what the code must not look like :).
>
>> date = do a <- integer
>> satisfy (=='.')
>> b <- integer
>> satisfy (=='.')
>> c <- integer
>> return (a,b,c)
>
> Of course, that is just the standard (strict) monadic interface used by many
> combinator libraries. Your original desire was for lazy parsing, and to
> achieve that, you must move over to the applicative interface. The key
> difference is that you cannot name intermediate values, but must construct
> larger values directly from smaller ones by something like function
> application.
>
>> lazydate = return (,,) `apply` integer `discard` dot
>> `apply` integer `discard` dot
>> `apply` integer
>> where dot = satisfy (=='.')
>
> The (,,) is the constructor function for triples. The `discard` combinator
> ensures that its second argument parses OK, but throws away its result,
> keeping only the result of its first argument.
>
> Apart from lazy space behaviour, the main observable difference between
> "date" and "lazydate" is when errors are reported on incorrect input. For
> instance:
>
> > fst $ runParser date "12.05..2009"
> *** Exception: In a sequence:
> Parse.satisfy: failed
> expected one or more digits
>
> > fst $ runParser lazydate "12.05..2009"
> (12,5,*** Exception: In a sequence:
> Parse.satisfy: failed
> expected one or more digits
>
> Notice how the lazy parser managed to build the first two elements of the
> triple, whilst the strict parser gave no value at all.
>
> I know that the error messages shown here are not entirely satisfactory, but
> they can be improved significantly just by making greater use of the
> `adjustErr` combinator in lots more places (it is rather like Parsec's <?>).
> Errors containing positional information about the input can be constructed
> by introducing a separate lexical tokenizer, which is also not difficult.
>
> Regards,
> Malcolm
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
More information about the Haskell-Cafe
mailing list