[Haskell-cafe] Parallel parsing & multicore
ekmett at gmail.com
Wed Sep 9 12:14:23 EDT 2009
Nice to see someone else working in this space.
I have also been working on a set of parallel parsing techniques, which can
use small Parsec parsers for local context sensitivity.
See the second set of slides in
http://comonad.com/reader/2009/iteratees-parsec-and-monoid/ for an overview
of how I'm doing something similar to feed Parsec independent chunks. Note
that this approach bypasses the need for a separate sequential scan, which
otherwise floods your cache, and lets you get closer to the performance
limit imposed by Amdahl's law.
The code in the second set of slides can be adapted to your case: load
everything into a lazy bytestring or fingertree of strict bytestrings, then
for each strict bytestring chunk in parallel, scan it for the first newline,
and then start an iteratee based parsec parser from that point. I use the
iteratee based parsec parsers so that when I glue the partial parses
together I can feed the unparsed data on the left side of the first newline
in each chunk to the parser I'm joining on the left. I provide a monoid for
the purpose of gluing together these partial parses, which encapsulates this
The fingertree case is particularly nice, because it can be used to do
cheap incremental reparsing using the same machinery based on out of band
updates to the input.
This approach is sufficient to parse a lot of interesting languages. As you
have noted with Makefiles it can handle indentation based control
structures, and with a variation on a Dyck language monoid it can be
extended to Haskell-style layout or parenthesis matching/lisp parsing by
applying the same techniques at a higher level.
On Wed, Sep 9, 2009 at 10:42 AM, Anakim Border <akborder at gmail.com> wrote:
> > Very interesting idea!
> > I think the big thing would be to measure it with GHC HEAD so you can
> > see how effectively the sparks are being converted into threads.
> > Is there a package and test case somewhere we can try out?
> At this point the parser is just a proof of concept. For those brave
> enough, however, I've put the code on github:
> The "test.mk" file provides some test cases. To get an input big
> enough to measure multiple thread performances, you can concatenate
> that file a few thousand times: the timings in my previous message
> were obtained parsing a 3000x concatenation (final size: 1.1 MB).
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Haskell-Cafe