[Haskell-cafe] Parallel ghc --make

Wed May 15 21:09:27 CEST 2013

To have a single-process ghc --make -j you first of all need internal
thread-safety:

GHC internally keeps a number of global caches that need to be made thread-safe:

  - table of interned strings (this is actually written in C and
accessed via FFI)
  - cache of interface files loaded, these are actually loaded lazily
using unsafeInterleaveIO magic (yuck)
  - cache of packages descriptions (I think)
  - the NameCache: a cache of string -> magic number.  This is used to
implement fast comparisons between symbols.  The magic numbers are
generated non-deterministically (more unsafeInterleaveIO) so you need
to keep this cache around.
  - HomeModules: These are the modules that have been compiled in this
--make run.

The NameCache is used when loading interface files and also by the Parser.

Making these things thread-safe basically involves updating these
caches via atomicModifyIORef instead of just modifyIORef.  I made
those changes a few years ago, but at least one of them was
rolled-back.  I forgot the details, but I think it was one use of
unsafePerformIO that caused the issues.  unsafePerformIO needs to
traverse the stack to look for thunks that are potentially evaluated
by multiple threads.  If you have a deep stack that can be expensive.
SimonM since added stack chunks which should reduce the overhead of
this.  Could be worthwhile re-evaluating the patch.

To have a multi-process ghc --make you don't need thread-safety.
However, without sharing the caches -- in particular the interface
file caches -- the time to read data from the disk may outweigh any
advantages from parallel execution.

Evan's approach of using a long-running worker process avoids issues
with reloading the most of the caches for each module, but it probably
couldn't take advantage of the HomeModule cache.  It would be
interesting to see if that was the issues.  Then, it would be
interesting if the disk access or the serialisation overhead is the
issue; if it's the former, some clever use of mmap could help.

HTH,
 / Thomas

On 13 May 2013 17:35, Evan Laforge <qdunkan at gmail.com> wrote:
> I wrote a ghc-server that starts a persistent process for each cpu.
> Then a 'ghc' frontend wrapper sticks each job in a queue.  It seemed
> to be working, but timing tests didn't reveal any speed-up.  Then I
> got a faster computer and lost motivation.  I didn't investigate very
> deeply why it didn't speed up as I hoped.  It's possible the approach
> is still valid, but I made some mistake in the implementation.
>
> So I can stop writing this little blurb I put it on github:
>
> https://github.com/elaforge/ghc-server
>
> On Mon, May 13, 2013 at 8:40 PM, Niklas Hambüchen <mail at nh2.me> wrote:
>> I know this has been talked about before and also a bit in the recent
>> GSoC discussion.
>>
>> I would like to know what prevents ghc --make from working in parallel,
>> who worked at that in the past, what their findings were and a general
>> estimation of the difficulty of the problem.
>>
>> Afterwards, I would update
>> http://hackage.haskell.org/trac/ghc/ticket/910 with a short summary of
>> what the current situation is.
>>
>> Thanks to those who know more!
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe