[Haskell-cafe] uu-parsinglib - Greedy Parser
Mario Blažević
mblazevic at stilo.com
Tue Jan 13 15:48:59 UTC 2015
On 15-01-13 08:25 AM, Marco Vassena wrote:
> Unfortunately in html there are also empty tags, which don't need to
> be closed. For instance the line-break tag <br>: <h1> Line break tags
> are <br> not closed </h1>
>
> The bigger picture is that I am trying to figure out what are the
> core constructs needed to define a parser, therefore I want to have a
> rather abstract interface. In my set of core constructs there are:
> <$> : (a -> b) -> f a -> f b
> <*> : f (a -> b) -> f a -> f b
> <|> : f a -> f a -> f a -- (symmetric choice)
> pure : a -> f a
> fail : f a
> pToken : f Char
>
> Is it possible to define a parser that applies the longest matching
> rule using these constructs only? Or is it necessary to extend it
> with another primitive, for instance greedy choice <<|> ? (Note that
> f is abstract and it is not necessarily uu-parsinglib parsers).
You can parse HTML with no ambiguous results if you allow monadic bind
(>>=) as well:
pTag = pElement <|> pCommentTag <|> pContent
pElement = do elemName <- pOpenTag
elemContent <- pTag `manyTill` endElement elemName
endElement elemName = string "</" *> string elemName *> string ">"
<|> lookahead (string "</"
*> some (satisfy (/= '>'))
*> string ">")
pContent = Content <$> some (satisfy (/= '<'))
pHtml = some pTag
Mind you, this code would not give you exactly the same parse tree as
an HTML 5 browser would. That spec is a nightmare.
More information about the Haskell-Cafe
mailing list