[Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)

S. Doaitse Swierstra doaitse at swierstra.net
Tue Apr 27 14:49:18 EDT 2010


How about:

import Text.ParserCombinators.UU.Parsing
import Text.ParserCombinators.UU.Examples


pDate :: Pars (Int,Int,Int)
pDate = (,,) <$> pNatural <* pDot <*> pNatural <* pDot <*> pNatural
        where pDot = pSym '.'

and then:

*Main> test pDate "3.4.5"
Loading package syb-0.1.0.2 ... linking ... done.
Loading package base-3.0.3.2 ... linking ... done.
Loading package array-0.3.0.0 ... linking ... done.
Loading package filepath-1.1.0.3 ... linking ... done.
Loading package old-locale-1.0.0.2 ... linking ... done.
Loading package old-time-1.0.0.3 ... linking ... done.
Loading package unix-2.4.0.0 ... linking ... done.
Loading package directory-1.0.1.0 ... linking ... done.
Loading package process-1.0.1.2 ... linking ... done.
Loading package time-1.1.4 ... linking ... done.
Loading package random-1.0.0.2 ... linking ... done.
Loading package haskell98 ... linking ... done.
Loading package uu-parsinglib-2.3.1 ... linking ... done.
((3,4,5),[])
*Main> test pDate "3..7"
((3,0,7),[
Inserted '0' at position 2 expecting '0'..'9'])
*Main> test pDate ""
((0,0,0),[
Inserted '0' at position 0 expecting '0'..'9',
Inserted '.' at position 0 expecting one of ['0'..'9', '.'],
Inserted '0' at position 0 expecting '0'..'9',
Inserted '.' at position 0 expecting one of ['0'..'9', '.'],
Inserted '0' at position 0 expecting '0'..'9'])
*Main> test pDate "3.4.2010"
((3,4,2010),[])
*Main>

Doaitse


On 27 apr 2010, at 13:23, Tom Hawkins wrote:

> I had been using Parsec to parse VCD files, but needed to lazily parse
> streaming data.  After stumbling on this thread below, I switch to
> polyparse.
> 
> What a great library!  I was able to migrate from a strict to a
> semi-lazy parser and many of my parse reductions didn't even need to
> change.  Thanks Malcolm!
> 
> In addition to lazy VCD parsing, this version of vcd [1] also includes
> step', which forces a step regardless if variables have changed or not
> -- helpful for realtime simulation.
> 
> (BTW, parsec is a great library too.)
> 
> -Tom
> 
> [1] http://hackage.haskell.org/package/vcd-0.1.4
> 
> 
> 
> On Sun, May 31, 2009 at 6:41 AM, Malcolm Wallace
> <malcolm.wallace at cs.york.ac.uk> wrote:
>> 
>> I don't know whether you will be willing to change over to polyparse
>> library, but here are some hints about how you might use it.
>> 
>> Given that you want the input to be a simple character stream, rather than
>> use a more elaborate lexer, the first thing to do is to specialise the
>> parser type for your purposes:
>> 
>>> type TextParser a = Parser Char a
>> 
>> Now, to recognise a "mere digit",
>> 
>>> digit :: TextParser Char
>>> digit = satisfy Char.isDigit
>> 
>> and for a sequence of digits forming an unsigned integer:
>> 
>>> integer :: TextParser Integer
>>> integer = do ds <- many1 digit
>>>              return (foldl1 (\n d-> n*10+d)
>>>                             (map (fromIntegral.digitToInt) ds))
>>>           `adjustErr` (++("expected one or more digits"))
>> 
>>> I mean I'd like to be able to turn "12.05.2009" into something like (12,
>>> 5, 2009) and got no clue what the code would have to look like. I do know
>>> almost every variation what the code must not look like :).
>> 
>>> date = do a <- integer
>>>           satisfy (=='.')
>>>           b <- integer
>>>           satisfy (=='.')
>>>           c <- integer
>>>           return (a,b,c)
>> 
>> Of course, that is just the standard (strict) monadic interface used by many
>> combinator libraries.  Your original desire was for lazy parsing, and to
>> achieve that, you must move over to the applicative interface.  The key
>> difference is that you cannot name intermediate values, but must construct
>> larger values directly from smaller ones by something like function
>> application.
>> 
>>> lazydate = return (,,) `apply` integer `discard` dot
>>>                        `apply` integer `discard` dot
>>>                        `apply` integer
>>>    where dot = satisfy (=='.')
>> 
>> The (,,) is the constructor function for triples.  The `discard` combinator
>> ensures that its second argument parses OK, but throws away its result,
>> keeping only the result of its first argument.
>> 
>> Apart from lazy space behaviour, the main observable difference between
>> "date" and "lazydate" is when errors are reported on incorrect input.  For
>> instance:
>> 
>>  > fst $ runParser date "12.05..2009"
>>  *** Exception: In a sequence:
>>  Parse.satisfy: failed
>>  expected one or more digits
>> 
>>  > fst $ runParser lazydate "12.05..2009"
>>  (12,5,*** Exception: In a sequence:
>>  Parse.satisfy: failed
>>  expected one or more digits
>> 
>> Notice how the lazy parser managed to build the first two elements of the
>> triple, whilst the strict parser gave no value at all.
>> 
>> I know that the error messages shown here are not entirely satisfactory, but
>> they can be improved significantly just by making greater use of the
>> `adjustErr` combinator in lots more places (it is rather like Parsec's <?>).
>>  Errors containing positional information about the input can be constructed
>> by introducing a separate lexical tokenizer, which is also not difficult.
>> 
>> Regards,
>>    Malcolm
>> 
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list