[sajith at gmail.com: Google Summer of Code: a NUMA wishlist!]

Mon Mar 26 17:31:40 CEST 2012

On 26/03/2012 04:25, Sajith T S wrote:

> Date: Sun, 25 Mar 2012 22:49:52 -0400
> From: Sajith T S<sajith at gmail.com>
> To: The Haskell Cafe<haskell-cafe at haskell.org>
> Subject: Google Summer of Code: a NUMA wishlist!
>
> Dear Cafe,
>
> It's last minute-ish to bring this up (in my part of the world it's
> still March 25), but graduate students are famously a busy and lazy
> lot. :)  I study at Indiana University Bloomington, and I wish to
> propose^W rush in this proposal and solicit feedback, mentors, etc
> while I can.
>
> Since student application deadline is April 6, I figure we can beat
> this into a real proposal's shape by then.  This probably also falls
> on the naive and ambitious side of things, and I might not even know
> what I'm talking about, but let's see!  That's the idea of proposal,
> yes?
>
> Broadly, idea is to improve support for NUMA systems.  Specifically:
>
>   -- Real physical processor affinity with forkOn [1].  Can we fire all
>      CPUs if we want to?  (Currently, the number passed to forkOn is
>      interpreted as number modulo the value returned by
>      getNumCapabilities [2]).

You can get real processor affinity with +RTS -qa in combination with 
forkOn.

>   -- Also kind of associated with the above: when launching processes,
>      we might want to specify a list of CPUs rather than the number of
>      CPUs.  Say, a -N [0,1,3] flag rather than -N 3 flag.  This shall
>      enable us to gawk at real pretty htop [3] output.

I like that idea.

>   -- From a very recent discussion on parallel-haskell [4], we learn
>      that RTS' NUMA support could be improved.  The hypothesis is that
>      allocating nurseries per Capability might be a better plan than
>      using global pool.  We might borrow/steal ideas from hwloc [5] for
>      this.

I like this idea too (since I suggested it :-).

>   -- Finally, a logging/monitoring infrastructure to verify assumptions
>      and determine if/how local work stays.

I'm not sure if you're suggesting a *new* logging/monitoring framework 
here, but in any case it would make much more sense to extend ghc-events 
and ThreadScope rather than building something new.  There is ongoing 
work to have ThreadScope understand the output of the Linux "perf" tool, 
which would give insight into CPU scheduling activity amongst other 
things.  Talk to Duncan Coutts <duncan at well-typed.com> about how far 
this is along and the best way for a GSoc project to help (usually it 
works best when the GSoc project is not dependent on, or depended on by, 
other ongoing projects - reducing synchronisation overhead and latency 
due to blocking is always good!).

Cheers,
	Simon

> (I would like to acknowledge my fellow conspirators and leave them
> unnamed, lest they shall be embarrassed by my... naivete.)
>
> Thanks,
> Sajith.
>
> [1] http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent.html#v:forkOn
> [2] http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent.html#v:getNumCapabilities
> [3] http://htop.sourceforge.net/
> [4] http://groups.google.com/group/parallel-haskell/browse_thread/thread/7ec1ebc73dde8bbd
> [5] http://www.open-mpi.org/projects/hwloc/
>