[Haskell-cafe] how to organize a parallel parser/compiler

Wed May 27 21:27:01 UTC 2015

hello haskell-cafe,

suppose you have a compiler pipeline roughly similar to (from write you a
Haskell <http://dev.stephendiehl.com/fun/007_path.html>
):

modl :: FilePath -> L.Text -> CompilerM ()
modl fname
    = parseP fname
  >=> dataP
  >=> groupP
  >=> renameP
  >=> desugarP
  >=> inferP
  >=> evalP

and suppose you have a language where you can compile multiple files and
compilation must be made _as if_ files were concatenated together in the
order given on the command line (there might be side effects that affect
parsing of subsequent files); still in the average case there're no
dependencies and files could be compiled in parallel (or some of the
dependencies can be handled at the AST level as one can prove they don't
affect parsing but only aspects of the AST that can be patched).

What are good strategies for dealing with this and rerun some of the parseP
functions (and in the simplest solution the complete pipeline after it).

The strategy I have in mind now doesn't mix well with the above pipeline,
so I'd like to see if there're alternative solutions.

Basically, what I have in mind is:
- each parse function gets a file, something to watch on for the result of
the previous parse (let's say an MVar, or some variation of speculation
<https://hackage.haskell.org/package/speculation>
, not sure yet) and an input environment (same for everybody). It produces
an AST + what he's sensitive to (e.g. what would have affected the parsing)
and what he generates that the next guy must be sensitive to.
- before producing the 'what it generates part', it must be sure to have
completed a valid parse, so he'll wait on the input MVar to know that the
previous files have been parsed properly and that whatever side effects it
caised wouldn't affect parsing. This wait will be done on a thread that
doesn't count towards the parallelism limit as it is presumably cheap and
we don't really serialize the parsing.

[a similar question related to the above pipeline would be how do I fit in
any kind of global transformation pass]

Is there any library or other ideas on how to combine multiple pipelines
where we want to run them with maximum parallelism but the outcome of (even
a partial step of) one pipeline can invalidate and require to rerun others?

Thanks for any idea,

  Maurizio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20150527/b2bd490c/attachment.html>