[Haskell-cafe] Lazy Parsing

S. Doaitse Swierstra doaitse at swierstra.net
Fri May 29 04:20:04 EDT 2009


Lazy parsing has been the default for the last ten years in uulib, and  
is now available in the simple uu-parsinglib (http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uu-parsinglib 
). The whole design of the latter in described in a technical report  
to which references are given on the web page. It provides also error  
correction, the ability to use several different kinds of input  
tokens, and (with some help) ambiguities. If speed is an issue you can  
insert extra hints which locally change the breadth-first parsing  
process locally into a somewhat more depth-first form. When compared  
with Parsec the good news is that usually you do not have to put  
annotations to get nice results.

The older uulib version also performs an abstract interpretation which  
basically changes the search for which alternative to take from a  
linear to a logarithmic complexity, but does not provide a monadic  
structure, in which you use results recognised thus far to construct  
new parsers.

Both the old uulib version and the new version have always had an  
applicative interface.

In the near future elements of the abstract interpretation of the old  
uulib version will migrate into the new version. It is the advent of  
GADT's which made this new version feasable.

An example of the error correction at work at the following example  
code:

pa, pb, paz :: P_m (Str  Char) [Char]
pa = lift <$> pSym 'a'
pb = lift <$> pSym 'b'
p <++> q = (++) <$> p <*> q
pa2 =   pa <++> pa
pa3 =   pa <++> pa2

pCount p = (\ a b -> b+1) <$> p <*> pCount p <<|> pReturn 0
pExact 0 p = pReturn []
pExact n p = (:) <$> p <*> pExact (n-1) p

paz = pMany (pSym ('a', 'z'))

paz' = pSym (\t -> 'a' <= t && t <= 'z', "a .. z", 'k')

main :: IO ()
main = do print (test pa "a")
           print (test pa "b")
           print (test pa2 "bbab")
           print (test pa "ba")
           print (test pa "aa")
           print (test  (do  l <- pCount pa
                             pExact l pb) "aaacabbb")
           print (test (amb ( (++) <$> pa2 <*> pa3 <|> (++) <$> pa3  
<*> pa2))  "aaabaa")
           print (test paz "ab1z7")
           print (test paz' "m")
           print (test paz' "")


is

loeki:~ doaitse$ ghci -package uu-parsinglib
GHCi, version 6.10.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.
Loading package syb ... linking ... done.
Loading package array-0.2.0.0 ... linking ... done.
Loading package filepath-1.1.0.1 ... linking ... done.
Loading package old-locale-1.0.0.1 ... linking ... done.
Loading package old-time-1.0.0.1 ... linking ... done.
Loading package unix-2.3.1.0 ... linking ... done.
Loading package directory-1.0.0.2 ... linking ... done.
Loading package process-1.0.1.1 ... linking ... done.
Loading package random-1.0.0.1 ... linking ... done.
Loading package haskell98 ... linking ... done.
Loading package uu-parsinglib-2.0.0 ... linking ... done.
Prelude> :m Text.ParserCombinators.UU.Examples
Prelude Text.ParserCombinators.UU.Examples> main
("a",[])
("a",[
Deleted  'b' at position 0 expecting one of ["'a'"],
Inserted 'a' at position 1 expecting one of ["'a'"]])
("aa",[
Deleted  'b' at position 0 expecting one of ["'a'"],
Deleted  'b' at position 1 expecting one of ["'a'"],
Deleted  'b' at position 3 expecting one of ["'a'"],
Inserted 'a' at position 4 expecting one of ["'a'"]])
("a",[
Deleted  'b' at position 0 expecting one of ["'a'"]])
("a",[
The token 'a'was not consumed by the parsing process.])
(["b","b","b","b"],[
Deleted  'c' at position 3 expecting one of ["'a'","'b'"],
Inserted 'b' at position 8 expecting one of ["'b'"]])
(["aaaaa"],[
Deleted  'b' at position 3 expecting one of ["'a'","'a'"]])
("abz",[
Deleted  '1' at position 2 expecting one of ["'a'..'z'"],
The token '7'was not consumed by the parsing process.])
('m',[])
('k',[
Inserted 'k' at position 0 expecting one of ["a .. z"]])
Prelude Text.ParserCombinators.UU.Examples>

Doaitse Swierstra





On 27 mei 2009, at 01:52, GüŸnther Schmidt wrote:

> Hi all,
>
> is it possible to do lazy parsing with Parsec? I understand that one  
> can do that with polyparse, don't know about uulib, but I happen to  
> be already somewhat familiar with Parsec, so before I do switch to  
> polyparse I rather make sure I actually have to.
>
> The files it has to parse is anywhere from 500 MB to 5 GB.
>
>
> Günther
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list