[Haskell-cafe] Parsing a bunch of strings without backtracking
Ben Franksen
ben.franksen at online.de
Thu Mar 16 21:50:51 UTC 2017
Am 09.03.2017 um 01:21 schrieb Javran Cheng:
> I come across an idea about parsing a bunch of strings,
> just want to share this idea and looking for some comments and maybe
> pointing me to some related articles.
>
> The idea is when you have:
>
> 1 <$ string "aabb" <|> 2 <$ string "aacc"
>
> you can rewrite it as:
>
> string "aa" >> (1 <$ string "bb" <|> 2 <$ string "cc")
>
> so that you don't have to deal with the common prefix "aa" twice.
In the parsing literature this is called "left factoring (a grammar)".
Left factoring is known to improve efficiency and there are published
algorithms to do it automatically; but -- as you observed -- to perform
it you need an explicit representation of the grammar, so you can
transform it before turning it into a parser; some would call this a
"deep embedding".
> Then I further extend this idea: if we have a set of strings (s1, s2, ...),
> whose element is not a prefix of any other set member,
> why don't we turn things like (v1 <$ string s1 <|> v2 <$ string s2 <|> ...)
> into a "tree-structure" so we don't need to deal with
> any common prefix more than once?
The problem is how to represent the grammar in a statically typed way
and then to transform it preserving the well-typing. This can be done,
but the price is high. Look for packages (and papers) named
'christmastree' and 'tttas' to get an idea. Even the authors admit that
their approach is extremely involved and that offline methods (such as
TH which you mentioned) may be more practical.
Cheers
Ben
More information about the Haskell-Cafe
mailing list