[Haskell-cafe] Parallel parsing & multicore
dons at galois.com
Wed Sep 9 12:16:09 EDT 2009
> Hi Anakim,
> Nice to see someone else working in this space.
> I have also been working on a set of parallel parsing techniques, which can use
> small Parsec parsers for local context sensitivity.
> See the second set of slides in http://comonad.com/reader/2009/
> iteratees-parsec-and-monoid/ for an overview of how I'm doing something similar
> to feed Parsec independent chunks. Note that this approach bypasses the need
> for a separate sequential scan, which otherwise floods your cache, and lets you
> get closer to the performance limit imposed by Amdahl's law.
> The code in the second set of slides can be adapted to your case: load
> everything into a lazy bytestring or fingertree of strict bytestrings, then for
> each strict bytestring chunk in parallel, scan it for the first newline, and
> then start an iteratee based parsec parser from that point. I use the iteratee
> based parsec parsers so that when I glue the partial parses together I can feed
> the unparsed data on the left side of the first newline in each chunk to the
> parser I'm joining on the left. I provide a monoid for the purpose of gluing
> together these partial parses, which encapsulates this behavior.
You can get quite a long way by using bytestring-mmap and strict
bytestrings. The first ensures your IO overhead will be low, and the
second ensures you don't migrate work needlessly.
More information about the Haskell-Cafe