[Haskell-cafe] Newbie: a parser for a list of objects?

Daniel Fischer daniel.is.fischer at web.de
Wed Mar 28 09:08:40 EDT 2007


Am Mittwoch, 28. März 2007 11:57 schrieb Dmitri O.Kondratiev:
> Daniel,
> I am still trying to figure out the order of function applications in the
> parser returning list of objects (I attached again the code to the end of
> this message for convenience).
>
> You wrote:
> (>*>) associates to the right, hence
> p >*> (p >*> (p >*> (... >*> (p >*> succeed [])...)))
>
> I don't understand where (p >*> succeed []) comes from?

The final 'succeed []' comes from a) the definition of pList p as
pList p = succeed [] `alt` ((p >*> pList p) `build` (uncurry (:)))
plus b) the assumption that p should be a parser which doesn't succed on an 
empty input and that the input is finite (though the second point is not 
necessary).

Let us unfold a little:

pList dig "12ab"
	=== succeed [] "12ab" ++ (((dig >*> pList dig) `build` (uncurry (:))) "12ab")
	=== [([],"12ab")] ++ [('1' : ds,rem) | (ds,rem) <- pList dig "2ab"]
	    -- since dig "12ab" = [('1',"2ab")]
	=== [([],"12ab")] ++ [('1' : ds,rem) | (ds,rem) <- (succed [] `alt` 
						(((dig >*> pList dig) `build` (uncurry (:))))) "2ab"]
	=== [([],"12ab")] ++ [('1' : ds,rem) | (ds,rem) <- ([([],"2ab")] ++ 
				[('2' : ds2,rem2) | (ds2,rem2) <- pList dig "ab"])]
	=== [([],"12ab"),("1","2ab")] ++
	    [('1' : '2' : ds2,rem2) | (ds2,rem2) <- (succeed [] `alt` 
		(((dig >*> pList dig) `build` (uncurry (:))))) "ab"]
	=== [([],"12ab"),("1","2ab")] ++
	    [('1' : '2' : ds2,rem2) | (ds2,rem2) <- ([([],"ab")] ++ 
		    (((dig >*> pList dig) `build` (uncurry (:))) "ab"))]
	    -- now 'dig' and hence 'dig >*> pList dig' fail on the input "ab", thus
	=== [([],"12ab"),("1","2ab"),("12","ab")]

Hum, I find that a bit hard to read myself, so let's introduce an alias for 
'alt', call it (<+>) and a new combinator which joins (>*>) and 
'build (uncurry (:))' :
(<:>) :: Parser a b -> Parser a [b] -> Parser a [b]
p1 <:> p2 = \inp -> [(x:ys,rem2)  | (x,rem1) <- p1 inp, (ys,rem2) <- p2 rem1]
	-- or p1 <:> p2 = build (p1 >*> p2) (uncurry (:))

Then we have (because p1 <:> (p2 <+> p3) === (p1 <:> p2) <+> (p1 <:> p3))
pList p
	=== succeed [] <+> (p <:> pList p)
	=== succeed [] <+> (p <:> (succeed [] <+> (p <:> pList p)))
	=== succeed [] <+> (p <:> succeed []) <+> (p <:> (p <:> pList p))
	=== succeed [] <+> (p <:> succeed []) <+> (p <:> (p <:> (succeed [] <+> 
					(p <:> pList p))))
	=== succeed []
	<+> (p <:> succeed [])
	<+> (p <:> (p <:> succeed []))
	<+> (p <:> (p <:> (p <:> succeed [])))
	<+> (p <:> (p <:> (p <:> (p <:> pList p))))
and so on.
And when we request more p's than the input provides, pList p isn't reached 
anymore and recursion stops (e.g. with p = dig and input "123" or "123a45",
the last line will fail because it demands four digits from the beginning of 
the input, but there are only three).
If p would succeed on an empty input, e.g. p = succeed 1 or the input is an 
infinite list of successes for p, e.g. p = dig and input = cycle "123", the 
unfolding would never stop, producing an infinite list of results, but each 
of these results wolud come from a finite chain of p's ended by a 
'succeed []'.

So the order of evaluation of 
pList p input = (succeed [] <+> (p <:> pList p)) input
	      = succeed [] input ++ (p <:> pList p) input
is
1. evaluate the first argument of (++), succeed [] input == [([],input)]
Since this is not _|_, we need also the second argument of (++), so
2. evaluate (p <:> pList p) input (to whnf first, more if required)
3. evaluate (++) as far as needed

2. is further split,
2.1. evaluate p input, giving a list of (obj,rem) pairs -- if that's empty, 
we're done, also if that produces _|_
2.2. (partially) evaluate pList p rem (goto 1.) giving a list of 
(objlist,rem2); [([],rem),([obj2],rem'),([obj2,obj3],rem'')...]
2.3. return the list of (obj:objlist,rem2) pairs

>
> Yet, if the order is as you describe, everything goes well, for example:
>
> comp1 = dig >*> dig   has type  - Parser char (char, char)
> comp2 = dig >*> (succeed [])  has type  - Parser char (char, [a])
> pl1 = comp2 `build` (uncurry (:)) has type - Parser char (char, [char])

pl1 has type Parser Char [Char] because 'uncurry (:)' has type (a,[a]) -> [a]

>
> At first run
> (succeed []) `alt` ((p >*> pList p) `build` (uncurry (:)))
>
> should be:
> [] ++  ((p >*> pList p) `build` (uncurry (:)))

(succeed [] `alt` ((p >*> pList p) `build` (uncurry (:)))) input
gives
[([],input)] ++ ((p >*> pList p) `build` (uncurry (:))) input
>
> so how we get:
> (p >*> succeed []) ?
>
> Thanks,
> Dima
>
Anytime,
Daniel



More information about the Haskell-Cafe mailing list