[Haskell-cafe] Parsing a bunch of strings without backtracking

Jaro noughtmare at openmailbox.org
Thu Mar 9 00:48:50 UTC 2017


Have you seen uu-parsinglib (http://foswiki.cs.uu.nl/foswiki/HUT/ParserCombinators)? It implements a non-backtracking parser that works similarly to what you describe.

On 9 March 2017 01:21:09 GMT+01:00, Javran Cheng <javran.c at gmail.com> wrote:
>Hi all,
>
>I come across an idea about parsing a bunch of strings,
>just want to share this idea and looking for some comments and maybe
>pointing me to some related articles.
>
>The idea is when you have:
>
>1 <$ string "aabb" <|> 2 <$ string "aacc"
>
>you can rewrite it as:
>
>string "aa" >> (1 <$ string "bb" <|> 2 <$ string "cc")
>
>so that you don't have to deal with the common prefix "aa" twice.
>
>Then I further extend this idea: if we have a set of strings (s1, s2,
>...),
>whose element is not a prefix of any other set member,
>why don't we turn things like (v1 <$ string s1 <|> v2 <$ string s2 <|>
>...)
>into a "tree-structure" so we don't need to deal with
>any common prefix more than once?
>
>So this motivates my little toy implementation at
>https://gist.github.com/Javran/1cbfe9897d9a5c8fae5d20a33fd83813
>
>basically given a list [(s1,v1),(s2,v2)...], we want to generate a
>parser
>equivalent to (v1 <$ string s1 <|> v2 <$ string s2 <|> ...)
>but perhaps with less backtracking. My code does this by turning the
>set of
>strings into a tree structure (see function "compact") and
>using this "Compact" representation as a driver to generate the parser.
>I used ReadP to have a concrete parser for testing, but this should
>work
>with any parsing library supporting MonadPlus.
>
>I think they are bunch of things that can be improved:
>
>- for now every intermediate node of "Compact" contains just a char, so
>this will end up generating a parser that has branch `char 'f' >> char
>'o'
>>> 'o'` in it
>    instead of the slightly more efficient version `string "foo"`
>- we could further attach a weight to each string, this allows
>rearranging
>(<|>) chains so that a more frequent string parser appears earlier.
>- if the input [(s1,v1),(s2,v2)...] are all known constants at compile
>time, probably TemplateHaskell can be used for generating the parser at
>compile time (as a Haskell expression).
>I'm not sure how difficult this would be though, as I have very limited
>understanding of TH.
>
>but before I explore further, I want to collect some comments as I said
>in
>the beginning.
>
>Best,
>-- 
>Javran (Fang) Cheng

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20170309/0d5e9799/attachment.html>


More information about the Haskell-Cafe mailing list