[Haskell-cafe] Parallel parsing & multicore

Wed Sep 9 12:34:02 EDT 2009

On Wed, Sep 9, 2009 at 12:16 PM, Don Stewart <dons at galois.com> wrote:

> ekmett:
> > Hi Anakim,
> >
> > Nice to see someone else working in this space.
> >
> > I have also been working on a set of parallel parsing techniques, which
> can use
> > small Parsec parsers for local context sensitivity.
> >
> > See the second set of slides in http://comonad.com/reader/2009/
> > iteratees-parsec-and-monoid/ for an overview of how I'm doing something
> similar
> > to feed Parsec independent chunks. Note that this approach bypasses the
> need
> > for a separate sequential scan, which otherwise floods your cache, and
> lets you
> > get closer to the performance limit imposed by Amdahl's law.
> >
> > The code in the second set of slides can be adapted to your case: load
> > everything into a lazy bytestring or fingertree of strict bytestrings,
> then for
> > each strict bytestring chunk in parallel, scan it for the first newline,
> and
> > then start an iteratee based parsec parser from that point. I use the
> iteratee
> > based parsec parsers so that when I glue the partial parses together I
> can feed
> > the unparsed data on the left side of the first newline in each chunk to
> the
> > parser I'm joining on the left. I provide a monoid for the purpose of
> gluing
> > together these partial parses, which encapsulates this behavior.
>
> You can get quite a long way by using bytestring-mmap and strict
> bytestrings. The first ensures your IO overhead will be low, and the
> second ensures you don't migrate work needlessly.

I'm actually using bytestring-mmap in a toy compiler which uses these
techniques, though at present I'm using System.IO.Posix.MMap.Lazy rather
than the strict version.

As for migrating work, maybe I'm not understanding your point, but my whole
goal was to migrate the chunking and parsing out to other cores, so the
behavior of parMap rwhnf'ing the monoidal reduction over the chunks seems to
be exactly what I want. I could perhaps get better results by trying to
assign the same thread contiguous ranges of the input though, and
controlling the choice of core used to merge chunks more intelligently by
maybe doing something like mixing up pseq and par to group runs onto the
same core where possible.

Could you elaborate?

-Edward
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090909/b236173e/attachment.html