[Haskell-cafe] Lazy Parsing
S. Doaitse Swierstra
doaitse at swierstra.net
Fri May 29 04:20:04 EDT 2009
Lazy parsing has been the default for the last ten years in uulib, and
is now available in the simple uu-parsinglib (http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uu-parsinglib
). The whole design of the latter in described in a technical report
to which references are given on the web page. It provides also error
correction, the ability to use several different kinds of input
tokens, and (with some help) ambiguities. If speed is an issue you can
insert extra hints which locally change the breadth-first parsing
process locally into a somewhat more depth-first form. When compared
with Parsec the good news is that usually you do not have to put
annotations to get nice results.
The older uulib version also performs an abstract interpretation which
basically changes the search for which alternative to take from a
linear to a logarithmic complexity, but does not provide a monadic
structure, in which you use results recognised thus far to construct
new parsers.
Both the old uulib version and the new version have always had an
applicative interface.
In the near future elements of the abstract interpretation of the old
uulib version will migrate into the new version. It is the advent of
GADT's which made this new version feasable.
An example of the error correction at work at the following example
code:
pa, pb, paz :: P_m (Str Char) [Char]
pa = lift <$> pSym 'a'
pb = lift <$> pSym 'b'
p <++> q = (++) <$> p <*> q
pa2 = pa <++> pa
pa3 = pa <++> pa2
pCount p = (\ a b -> b+1) <$> p <*> pCount p <<|> pReturn 0
pExact 0 p = pReturn []
pExact n p = (:) <$> p <*> pExact (n-1) p
paz = pMany (pSym ('a', 'z'))
paz' = pSym (\t -> 'a' <= t && t <= 'z', "a .. z", 'k')
main :: IO ()
main = do print (test pa "a")
print (test pa "b")
print (test pa2 "bbab")
print (test pa "ba")
print (test pa "aa")
print (test (do l <- pCount pa
pExact l pb) "aaacabbb")
print (test (amb ( (++) <$> pa2 <*> pa3 <|> (++) <$> pa3
<*> pa2)) "aaabaa")
print (test paz "ab1z7")
print (test paz' "m")
print (test paz' "")
is
loeki:~ doaitse$ ghci -package uu-parsinglib
GHCi, version 6.10.1: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.
Loading package syb ... linking ... done.
Loading package array-0.2.0.0 ... linking ... done.
Loading package filepath-1.1.0.1 ... linking ... done.
Loading package old-locale-1.0.0.1 ... linking ... done.
Loading package old-time-1.0.0.1 ... linking ... done.
Loading package unix-2.3.1.0 ... linking ... done.
Loading package directory-1.0.0.2 ... linking ... done.
Loading package process-1.0.1.1 ... linking ... done.
Loading package random-1.0.0.1 ... linking ... done.
Loading package haskell98 ... linking ... done.
Loading package uu-parsinglib-2.0.0 ... linking ... done.
Prelude> :m Text.ParserCombinators.UU.Examples
Prelude Text.ParserCombinators.UU.Examples> main
("a",[])
("a",[
Deleted 'b' at position 0 expecting one of ["'a'"],
Inserted 'a' at position 1 expecting one of ["'a'"]])
("aa",[
Deleted 'b' at position 0 expecting one of ["'a'"],
Deleted 'b' at position 1 expecting one of ["'a'"],
Deleted 'b' at position 3 expecting one of ["'a'"],
Inserted 'a' at position 4 expecting one of ["'a'"]])
("a",[
Deleted 'b' at position 0 expecting one of ["'a'"]])
("a",[
The token 'a'was not consumed by the parsing process.])
(["b","b","b","b"],[
Deleted 'c' at position 3 expecting one of ["'a'","'b'"],
Inserted 'b' at position 8 expecting one of ["'b'"]])
(["aaaaa"],[
Deleted 'b' at position 3 expecting one of ["'a'","'a'"]])
("abz",[
Deleted '1' at position 2 expecting one of ["'a'..'z'"],
The token '7'was not consumed by the parsing process.])
('m',[])
('k',[
Inserted 'k' at position 0 expecting one of ["a .. z"]])
Prelude Text.ParserCombinators.UU.Examples>
Doaitse Swierstra
On 27 mei 2009, at 01:52, Günther Schmidt wrote:
> Hi all,
>
> is it possible to do lazy parsing with Parsec? I understand that one
> can do that with polyparse, don't know about uulib, but I happen to
> be already somewhat familiar with Parsec, so before I do switch to
> polyparse I rather make sure I actually have to.
>
> The files it has to parse is anywhere from 500 MB to 5 GB.
>
>
> Günther
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list