[Haskell-cafe] uu-parsinglib - Greedy Parser

Mario Blažević mblazevic at stilo.com
Tue Jan 13 15:48:59 UTC 2015


On 15-01-13 08:25 AM, Marco Vassena wrote:
> Unfortunately in html there are also empty tags, which don't need to
> be closed. For instance the line-break tag <br>: <h1> Line break tags
> are <br> not closed </h1>
>
> The bigger picture is that I am trying to figure out what are the
> core constructs needed to define a parser, therefore I want to have a
> rather abstract interface. In my set of core constructs there are:
> <$> : (a -> b) -> f a -> f b
 > <*> : f (a -> b) -> f a -> f b
 > <|> : f a -> f a -> f a  -- (symmetric choice)
 > pure : a -> f a
 > fail : f a
> pToken : f Char
>
> Is it possible to define a parser that applies the longest matching
> rule using these constructs only? Or is it necessary to extend it
> with another primitive, for instance greedy choice <<|> ? (Note that
> f is abstract and it is not necessarily uu-parsinglib parsers).

	You can parse HTML with no ambiguous results if you allow monadic bind 
(>>=) as well:

pTag = pElement <|> pCommentTag <|> pContent
pElement = do elemName <- pOpenTag
               elemContent <- pTag `manyTill` endElement elemName
endElement elemName = string "</" *> string elemName *> string ">"
                       <|> lookahead (string "</"
                                      *> some (satisfy (/= '>'))
                                      *> string ">")
pContent = Content <$> some (satisfy (/= '<'))
pHtml = some pTag

	Mind you, this code would not give you exactly the same parse tree as 
an HTML 5 browser would. That spec is a nightmare.


More information about the Haskell-Cafe mailing list