[Haskell-cafe] Parallel Haskell Digest 8

Fri Mar 2 14:56:03 CET 2012

Parallel Haskell Digest 8 
=========================
2012-03-01

Hello Haskellers!

It's time for our next catch-up with the Parallel Haskell community.
Did you have a chance to see Simon Peyton Jones's talk [The Future is
Parallel, and the Future of Parallel is Declarative][v2]? It's a good
survey of some of the directions that parallel Haskell has been taking,
and if you're somewhat new to this stuff, a great feel for the breadth
of the community.  You'll get a better idea why you see people in this
digest muttering about arrays, others trying to make channels and
transactions work together, and yet others talking up the 0MQ protocol.
So check it out!

We at Well-Typed are writing the digest as part of our mission
to push Haskell parallelism and concurrency out into the real
world. We are excited about these technologies and we want to do
whatever it takes to make it more accessible to everybody.  More
news below on how we're doing in this Parallel GHC project.

HTML version of this digest at http://www.well-typed.com/blog/64

News
----------------------------------------------------------------------
*    [GHC 7.4.1 released][n1]

     Of particular interest to parallel Haskellers: better profiling
     flags, multicore profiling, vastly improved DPH, event logging,
     more convenient RTS flags (no more `-rtsopts` needed for common
     options like `-N`). See that new event logging stuff? Could be
     a good chance to play with the new spark profiling features from
     [ThreadScope][ts-tour].

*    [Summer School on Functional Programming for Parallel and Concurrent Applications][n2] (2 weeks)

     Imagine this: two summer weeks in the Southern France on the Côte
     d'Azur, learning about Parallel Haskell.  This would be the
     CEA-EDF-INRIA Summer School on Functional Programming for
     Parallel and Concurrent Applications.  It takes place on 11-22
     June 2012, Castle of Cadarache, Saint Paul Lez Durance, France.

     No functional programming experience needed. The aim of the summer
     school is to give a thorough and application-oriented introduction
     to functional programming using the programming language Haskell. A
     special focus is on parallel and concurrent programming,
     highlighting the ways in which features such as strong typing and
     purity make it dramatically easier to write reliable parallel or
     concurrent code. The school is split into three different courses
     that highlight different aspects of functional programming. All
     courses consist of lectures and hands-on sessions where everyone
     can try out the language on several exercises.

     Well-Typed's Andres Löh will be there, as well as Ralf Hinze,
     and Simon Marlow.  The registration deadline is on 30 May, but
     I wouldn't procrastinate if I were you; the course sounds really
     tempting!

Jobs
----

*    [PhD program at Portland State accepting applications for Fall 2012][n3] (18 Jan)

     There's quite a bit of functional programming research at PSU, and
     one of the professors (Andrew Black) is a co-author on the Cloud
     Haskell paper!

*    [6 Funded Research Studentships at St Andrews][n4]

     The School of Computer Science at the University of St Andrews has
     six prize PhD studentships available.  These studentships are
     funded through the Scottish Informatics and Computer Science
     Alliance and provide both fees and maintenance (£13,500) per annum.
      They can be taken up by students of any nationality (including
     non-EU students).  The deadline is *today* (1 March), so if you're
     interesting, it might not be too late to get in touch with
     [Kevin Hammond](mailto:kh at cs.st-andrews.ac.uk).

*    [Computer Science Lecturer Job at Kent][n5]

     The School of Computing at the University of Kent is seeking to
     appoint a high flying academic who will spend the first 3 years in
     post developing as an independent researcher of internationally
     excellent quality in an area relevant to the School of Computing.
     This does not look Parallel Haskell specific, or even FP specific,
     but Simon Thompson did tweet that “the functional programmers at
     Kent would be particularly keen to recruit someone who would work
     with them!“

Parallel GHC Project Update
----------------------------------------------------------------------
ThreadScope 0.2.1 has been released! This version contains the features
we had demonstrated at the Haskell Implementor's Workshop in September
2011.  Since our workshop prototype, we have greatly refined the spark
histogram feature, tuning the visualisations so that they are easier to
understand.  We've also written a small tutorial to go with the new
release. The [ThreadScope Tour][ts-tour] works through concrete examples
on using ThreadScope to debug the performance of parallel programs.
We'd love feedback you have about the tutorial, especially things you
feel like you need a little more help with.

Along with the new ThreadScope and tutorial, we also have a new version
of the ghc-events package which now provides an encoding of the meanings
of events in state machines. This makes it possible to validate
eventlogs, and doubles as an always up-to-date source of code as
documentation.

We've made some progress on our work in developing a swappable
transport layer for Cloud Haskell.  We now have a prototype
implementation “distributed-process” (intended to be the sucessor to
“remote”, the current Clound Haskell implementation).  For more details,
see the [distributed-process GitHub page][dp-github], particularly the
examples and the [design document][dp-design], which incorporates
feedback on our initial proposal.

Finally a bit of partner news to wrap things up:

* Tim (LANL) has been fine-tuning and profiling their physics simulation
  tool. One change in particular has been to switch to a “counter-based”
  RNG scheme which lends itself nicely to use in parallel or distributed
  settings. They are also currently having a look at Haskell-MPI and
  Cloud Haskell.

* Finlay (Dragonfly) is working on a tricky performance problem,
  trying to speed up a program with multiple relatively small
  computations and relatively many synchronization points in between.

* Toni (Telefonica) is benchmarking the maximal clique enumeration
  algorithm on a large number of randomly generated graphs.  He is
  getting good speedups for up to 6 cores, with some stagnation around 8
  cores.  Current research is either finding a graph traversal strategy
  that works reasonably well with all graphs, or have one which somehow
  adapts to the properties of the graph.

* Kazu (IIJ) has made several new releases of his libraries, and
  bug-fixes for a lot of third party libraries.  He has also been
  participating in the discussion that lead to the development of
  Michael Snoyman's conduit library, and switched Mighttpd over to
  good effect.

* Mark (VETT) is learning how to use Cloud Haskell. He is currently
  writing some example programs to test his understanding. Hopefully
  more from Mark when he's had more time to play with this

Word of the month
----------------------------------------------------------------------
Over the next few digests, we'll be switching our focus from parallelism
to concurrency.  We tend to stress the distinction because Haskell
offers ways to write parallel programs without making explicit use of
concurrency. Parallelism done right gets us faster programs.
Concurrency on the other hand buys us… concurrency.  It's not going
away. If every multicore computer in existence were to vanish, we would
want to solve concurrent problems. Whether the simultaneity is real or
simulated, we would still want to do more than one thing at the same time
– accept user input, display progress messages, serve multiple clients.

So let's dig in! We first got a taste of concurrency in second Parallel
Haskell digest, where we introduced the notion of [threads][ph2].  As an
abstraction, threads give us a way to express the separation of concerns
between different jobs.  But this isn't enough.  Sometimes we need to
undo the separation just enough to pass information from one thread to
another.

This brings us to our word of the month: `MVar`. The humble `MVar`
(pronounced “em-var”) is one of many solutions for this communication
problem, a fairly low-level one by Haskell standards, but one that is
still useful enough that you'll see it used very frequently. An `MVar`
is like a burri… wait, wrong tutorial.  Actually, it is helpful to think
of an `MVar` as a box in the sense that it holds values and can either
be full or empty.  The `MVar` type takes a type variable, so an `MVar
Int` might hold an integer , an `MVar String` a String, an `MVar
[String]` a list of strings and so on.

       -- Control.Concurrent.MVar
       data MVar a
       newEmptyMVar :: IO (MVar a)
       takeMVar :: MVar a -> IO a
       putMVar  :: MVar a -> a -> IO ()

To give an idea how this might be used, below is a small program that
fetches some URL in one thread while doing something else in the other.
We fork off a Haskell thread that does the fetching and write to `MVar`
to indicate what we've retrieved.  In the main thread, we do our other
work and then just wait until the page has been fetched.

     main = do
        m <- newEmptyMVar

        forkIO $ do
          r <- getURL "http://en.wikipedia.org/wiki/Shovel"
          putMVar m r

        doSomethingElse

        r <- takeMVar m
        putStr r

These `MVar`'s may look a little familiar if you've used `IORef`s in
Haskell.  Here is a mini API for comparison:

       -- Data.IORef
       data IORef a
       newIORef   :: IO (IORef a)
       readIORef  :: IORef a -> IO a
       writeIORef :: IORef a -> a -> IO ()

So what exactly do `MVar`'s buy us?  Why not just use `IORef`s to share
mutable variable across threads? The reason is that coordination between
threads can get messy: we want to make sure that any value we pass from
one thread to another is accounted for (and not accidentally overwritten
before being consumed), that we don't try to consume values that are
somehow out of date with respect to other threads (that updated values
are received instead of an old value being read twice).  Suppose we
wanted to fetch a URL while doing something else at the same time.  How
do we know when we have successfully retrieved it?

     -- don't write this at home!
     inadvisableMain = do
        m <- newIORef "" -- default value? :-(

        forkIO $ do
          r <- getURL "http://en.wikipedia.org/wiki/Shovel"
          writeIORef m r -- are we overwriting something? :-(

        doSomethingElse

        r <- readIORef m -- is there something to read? :-(
        putStr r

In the example above, we have no idea if the page at URL would have been
fetched by the time we try to display its contents.  What we are looking
for is a *synchronisation* mechanism.  We need a way to
indicate that our shared values are ready to be used. For example, we
might hit upon the idea of combining `IORef` with `Maybe`. Now we have
a some extra bureaucracy to handle.  If we read a value and get
`Nothing` we would then know that there isn't yet a value ready to be
read.  One way or another we would have to account for this case, for
example busy waiting until we get a `Just`. On the flip side, we want to
make sure that when somebody has written a value, the intended recipient
has read it before we write over it.  This sort of bureaucracy would be
best packaged up into helper functions, functions that look awfully like
`takeMVar` and `putMVar` might.  Notice the change in name even. Now
we're not just reading, but *taking*, emptying the box to signal that
it's been read; and we're not just writing, but *putting*, only writing
when the box is empty.  Throw in a little help from the runtime system
so that we're doing something smarter than busy waiting and we'll have
worked our way back to the `MVar`.

So the `MVar` combines references with locking to provide for
synchronisation between threads. If you're coming from other languages,
this should sound rather familiar. C programmers may have used mutexes
(`MVar ()`) or semaphores (`MVar Int`) to protect shared data. Java
programmers may have used `synchronized` methods and statements to
prevent thread interference and memory inconsistency problems.  The
`MVar` is just a slightly nicer Haskell packaging to the same sort of
idea. This means it suffers the same problems as its locky sisters.

Sure, having the locks be implicit and putting them where they count
(the data being shared) makes life a bit simpler, but at the end of
the day locks are still locks.  Take them in the wrong order, and you
can still get deadlocks, races and all those subtle hard-to-reproduce
bugs that keep programmers up at night.

So what is the hapless foot-shooting programmer to do? The good news is
that `MVar`'s are only one of several mechanisms for dealing with
concurrency.  Safer mechanisms exist, albeit at a cost.  `MVar`'s present
a compromise between performance and safety. If you are extra extra
careful, you can on the one hand squeeze some serious performance out of
`atomicallyModifyIORef` for concurrent data structures.  If on the other
hand, if you're willing to take a potential performance penalty in
exchange for never worrying about a deadlock again, stay tuned for our
next word of the month, “[transaction][stm]“.  For more about `MVar`'s in the
meantime, have a look at Edward Z. Yang's blog for an [MVar
overview][mvar-ezyang] as well as the updated [API
documentation][mvar-api], and finally the [concurrency chapter][mvar-rwh]
from Real World Haskell.

Videos
----------------------------------------------------------------------
*    [It's Raining Haskell][v1] (30 minutes)

     Charles from MSDN's Channel 9 had a chance to catch Simon
     Peyton Jones and John Hughes for a chat at the YOW conference in
     Sydney.  As John observed during the interview, “Haskell is
     picking up more and more real users now, and that's very pleasing,
     but one of the reasons is because it offers a way of getting to
     grips with parallelism without suffering all the difficulties that
     you do with imperative programming languages, so I think that's a
     big opportunity for Haskell. I'd hope it won't just be a source of
     ideas for other languages, but that it will actually see a lot of
     real use itself.”

*    [The Future is Parallel, and the Future of Parallel is Declarative][v2] (1 hour)

     Simon wasn't at YOW just to give interviews on park benches.
     During his presentation, Simon gives us a sense of what a
     parallel future might look like.  He skips past the usual
     FP-for-multicore propaganda, jumping straight to the prediction
     that “in the end, *something* declarative will win”.  The bad
     news is that “there is no parallelism without tears”.  If we want
     parallelism, we need to develop parallel algorithms, and
     understand the cost models behind the approaches we use. But one
     cost model does not fit all problems, so “no silver bullet; embrace
     diversity”. Simon then presents some of the different paradigms on
     offer, using Haskell as a single linguistic framework, but not
     specifically pushing the language.  He covers task parallelism (threads,
     STM, Cloud Haskell), semi-implicit algorithm (`par` and friends) and data
     parallelism (Repa, DPH).

     It's worth knowing that the YOW conference is “for developers by
     developers” and Simon has given the talk for a non-Haskell audience.
     So if you really just want an idea what all the parallel fuss is
     about, give his talk a try!

*    [Real-time ray tracing demo with Repa ][v3]

     As posted by Ben Lippmeier: Real-time ray tracing demo written in Haskell
     with a pre-release of the Gloss-Field and Repa 3.0.0 parallel array
     libraries. Demo is running in real time in parallel on a four core machine
     with two hardware threads per core.

*    [Barnes-Hut Gravity Simulation][v4]

     More demos? OK! How about this gravity simulation using Gloss? A
     naive n-body simulation has quadratic runtime complexity,
     O(n^2); but with the Barnes-Hut algorithm, we can approximate one
     with only an O(n log n) runtime. These simulations lend
     themselves well to parallelism; however, at present (2012-02),
     the sequential version of this program (using Data.Vector) still
     runs faster than its parallel equivalent (using DPH). The good
     news is after working on DPH vectorisation for 6 months, the
     asymptotic complexity of the DPH version now matches that of the
     sequential one. Hooray! Well OK, so there's still pesky constant
     factor involved. The sequential version is still 10 times faster,
     but if Ben can improve DPH enough to make the parallel version
     faster by August, he's buying everybody beer. See [his blog
     post][b4] for more details, as well as [Manuel Chakravarty's
     slides][b5] on vectorisation and the shared data problem.

Blogs and packages
----------------------------------------------------------------------
*    [Making OpenCL™ Simple with Haskell][b1] (Jan 2011)

     Benedict R. Gaster published a set of slides (53 slides) presenting
     the results of his collaboration with Garrett Morris (AMD intenern
     and PSU PhD student).  “It's a heterogeneous world” out there, with
     a modern platform including some CPUs, GPUs, and more.  OpenCL is
     a framework for writing parallel programs on this mixture of
     devices.  Benedict's slides introduce OpenCL, and talk about the
     low-level Haskell FFI to OpenCL C, and the use of monads and
     Template Haskell quasiquotation so that we can write something
     higher-level and more Haskelly. See the slides also for some
     comments about Haskell at AMD.

*    [Propane: Functional synthesis of images and animations in Haskell][b2]

     Following up on his quasicrystal code, Keegan McAllister has
     released Propane, a library for functional synthesis of images and
     animations.  

     > Propane uses Repa for data-parallel array computations. That means
     > it automatically uses multiple CPU cores for rendering, provided
     > the program is compiled and run with threads enabled. That said,
     > it's not yet been optimized for speed in other ways.

*    [Black-Box Reverse Engineering ZMQ][b3]

     [ØMQ (ZeroMQ)][zmq] and the ZMTP protocol provide a high performance
     messaging layer for concurrent/distributed applications.
     Unfortunately, for Xavier Lange, “the ZMQ spec is not fully
     cooked”.  This he discovered when trying to use libzmq and find tha
     the data sent by it doesn't quite match the specification.  So
     what's a ZMQ implementor to do?  In this blog post, Xavier inserts
     a packet sniffer between his Haskell code and a pure C libzmq test app
     and teases apart the “real” protocol as implemented by the library.
     He'll be studying the (LGPL) code next to see if he got the right
     idea from his black box debugging efforts.

*    [stm-channelize 0.1.1][stm-channelize] (20 Jan)

     This package arose from Joey Adam's [frustrations with
     asynchronous IO in Haskell][m5] for a networking layer
     he was writing.  The package turn I/O operations into STM
     transactions on channels.

*    [stm-conduit-0.2.1][m14] (9 Feb)

     Clark Gaebel invites us to check out his stm-conduit package, which
     “introduces conduits to the wonderful world of concurrency“:

     > My package solves the common problem of constant bottleneck
     > switching loaders have. This is when, for example, we stream XML
     > from the disk and then parse the XML in one conduit pipeline. While
     > it streams a file from the disk, the process is clearly IO bound,
     > and while it parses the XML, the process is CPU bound. By putting
     > each task on its own thread, the disk IO doesn't need to wait for
     > the CPU to parse a document before loading the next file. By using
     > stm-based conduits, we have full resource utilization.

Mailing lists
----------------------------------------------------------------------

### Parallelism

*    *Help wanted!* [Optimizations and parallel execution in the IO for
     a small spellchecker][m17] (15 Feb)

     Paul Sujkov wrote a simple [spellchecker][suj-spell] that checks
     for words that aren't in a dictionary file.  He's used it to
     benchmark the various Haskell hash table implementations on
     hackage, and now wants to see if he can take things a step further
     with parallelisation.  He's tried using monad-par and the parallel
     packages both, but seems to have performance trouble due to the
     IO part of the problem.  Any tips for Paul?

*    [SMP parallelism increasing GC time dramatically][m1] (9 Jan)

     Mikolaj Konarski followed up on a discussion started by Tom Thorne
     a while ago. Tom was nagged (in stereo!) to ThreadScope his
     performance problems, which he would have done had ThreadScope not
     crashed. Ben Gamari thinks he may have experienced something
     similar. What about you? Has ThreadScope ever crashed on you?

*    Data Parallel Haskell on MacOS X (and GHC 7.4) (15-21 Jan, 9 Feb)

     Mukesh Tiwari has been a series of problems getting dph to work on
     MacOS X Lion. He has gotten messages like “Couldn't figure out LLVM
     version” and “Error running clang!” and more recently, the fact
     that `-fdph-par` is no longer recognised by GHC.

     Some solutions (Brandon Allbery and Carter Schonwald) seem to be to
     learn that the LLVM that comes with your Mac is not a full
     distribution, and that you would need to install one (eg.  `brew
     install llvm` for Homebrew users).

     As for GHC 7.4, the folks at UNSW have not had a chance to update
     the DPH wiki.  In the meantime, try Ben Lippmeier's suggestion to
     use `-package dph-lifted-vseg` to select the backend, and also look
     at the dph-examples cabal file to see what flags to use when
     compiling.

     - [Installing dph-examples in Mac OS X Version 10.7.2][m6]
     - [Error in installing dph-examples on Mac OS X 10.7.3][m15] (9 Feb)
     - [Compiling dph package with ghc-7.4.0.20111219][m8] (21 Jan)

*    [Pruned Sparks in Original Strategies][m7] (17 Jan)

     Burak Ekici is trying to parallelise a sampling of programs to see
     how first and second generation Strategies compare.  In his
     understanding, the garbage collection mechanism in this older
     version had an issue which prevented sparks from being pruned.  Yet
     he finds sparks being pruned anyway with the original version.  Any
     reasons why this might be the case?

*    [Speculative Parallelism][m10] (30 Jan)

     Burak was also thinking about the idea that second generation
     strategies can exploit speculative parallelism where the original
     could not. How can he convince himself of this? Does it indicate
     speculative parallelism when all and the same number of created
     sparks are converted by both strategies?

*    [Evaluating parallel computations in order of finishing (approximately)][m12] (6 Feb)

     Victor Miller wants a way to evaluate a list of computations in
     parallel, returning them in (roughly) the order of completion.
     Does something of that sort exist?  It does! Ryan Newton added
     that this is a common non-deterministic merge operation used in
     stream-processing frameworks.  You might have an IO action like
     `parCompletionOrder :: [a] -> IO [a]`, but one which returns
     immediately allowing the list to be lazily consumed.  The function
     [Control.Concurrent.Chan.getChanContents][chan] should be helpful
     in implementing such a function.

*    [Merging two sorted list in parallel ( dph )][p2] (29 Jan)

     Mukesh Tiwari has two sorted lists of triples that he wants to
     merge in parallel.  The triples have the form `(i,j,val)`, and
     are sorted by their `j` and `i` elements.  If both lists have an
     element with the same `j`/`i`, we take the sum of their `val`.
     Mukesh has implemented half of a merging algorithm involving
     binary search, but he's stuck on updating his DPH arrays.
     Is there a function that does something like the below?

        updateWithFun :: (a -> a -> a) -> [:(Int,a):] -> [:a:] -> [:a:]

*    [repa nested sequential warning][p3] (27 Jan)

     Rick got some warnings in Repa about “performing nested parallel
     computation sequentially“ with the recommendation to use the
     `force` function.  See the thread for some specific questions
     you might have about Repa and forcing arrays.  The overall answer
     from Ben Lippmeier is to use Repa 3 if possible, which should
     resolve the problems Rick was facing and also provide some
     less confusing terminology (eg. renaming `force` to `computeP`)

*    [how to write this with Control.Concurrent.Strategies ?][p4] (24 Jan)

     Johannes Waldmann would like to be able to say to his students
     “mergesort already *is* an inherently parallel algorithm” and "you
     can execute it as such - you just need to annotate the source
     text". He has an implementation written in the `par/pseq` style
     but wants to know the “right” way to express it with Strategies.
     Patrick Maier posted an attempt which gets similar speedups to
     Johannes' original implementation. For comparison, Simon Marlow
     added a version using monad-par. Same problem, three techniques!
     This posting could be useful if you're just getting started
     learning about parallelism.

     Also, Chris Brown observed that merge sort is heavily sequential
     at the merge point and does not parallelise very well.  But
     Johannes is only really looking for mild parallelism, enough to
     keep 4 to 8 cores busy. For people wanting to get a lot of
     paralellism out of merge sort, Chris Brown suggests a technique
     using balanced binary trees, and Ryan Newton a parallel
     divide-and-conquer (during merge phase), as used in the `cilkmerge`
     function in the Cilk sorting code.

### Concurrency

*    [STM: "nested atomically" error][m3] (12 Jan)

     Johan Brinch is getting a “Control.Concurrent.STM.atomically was
     nested” error and can't understand where it's coming from. There
     is some unsafe IO in there, but only in the form of debug messages.
     These innocuous-looking debugs may be the culprit, according to
     Andrew Coppin and Brandon Allberry.  Their respective guesses are
     that (1) something to be evaluated that wouldn't be otherwise
     or (2) that something is being lazily evaluated as usual, but
     perhaps being forced within a transaction.  Brandon adds in
     particular that the GHC IO system uses STM internally, which might
     explain nested transactions in the presence of unsafePerformIO.
     These sorts of shenanigans are normally prevented by the type
     system, but all bets are off once you go unsafe.

*    [STM atomic blocks in IO functions][m4] (13 Jan)

     Rob Stewart would like to know if he would still be guaranteed
     thread safety if he had some separate `atomically` transactions in
     an IO action, as opposed to one big block.  Bryan O'Sullivan
     replied that “[i]f you want successive operations to see a
     consistent state, they must occur in the same atomically block.”

     This confused Ketil Malde. Isn't the idea for state to be
     consistent on entry/exit of each block, that breaking a program
     into multiple transactions is fine so long as each transaction is
     semantically complete unit?  Steffen Schuldenzucker offered a
     clarification, that the “consistent state” Bryan was talking about
     is being sure that no other thread has modified a value during the
     span of the block.

     Rob had a follow-up question: is it possible to perform IO within
     an STM block?  In a word, no.  Daniel Waterworth explains that it
     would not make sense to allow this, because an STM transaction
     may be retried, which is typically not what you want to happen to
     your IO actions.  If it's not possible to do with out,
     [TwilightSTM][twilight] or some sort of locking solution with a
     TMVar might be options.  But usually you can factor the IO out so
     that it is not necessary to do within transactions.

*    [How to make asynchronous I/O composable and safe?][m5] (14 Jan)

     Joey Adams is “not happy with asynchronous I/O in Haskell. It's
     hard to reason about, and doesn't compose well”.  He was
     grappling with several issues simultaneously (hopefully
     fodder for the next revision of Simon Marlow's [Concurrent
     Haskell tutorial][pcp-tutorial]).

     The setup is that Joey is writing a networking layer an
     application.  Writing a simple client/server to test this layer
     turned out to be horrible on the server side (“an abomination”).
     The main problem seems to be that he is forced to use two threads
     per client (sending and receiving), because GHC does not provide a
     way to simultaneously wait for input and wait for an STM
     transaction to succeed.  Issues that arise from this are

     * synchronizing the threads (e.g. when one dies, kill the other one)
     * multiple places where an exception can arise (which can lead to
       problems like exceptions interrupting atomic operations)
     * sharing a connection handle between multiple threads
       (you may recognise this problem if you've seen garbled/interleaved
       output from your concurrent code, eg. “AnOonteh esre nsteenntceen.c
       e.“)

     Some of the discussion that came up went over using channels to
     ensure the use of whole messages only (avoiding the issue with
     exceptions) and help “separate [the] application thread from the
     complexities of connection management“ (Bardur Arantsson, Daniel
     Waterworth); the observation that a pair of input/output channels
     is essential a stream processor arrow (Peter Simons); the
     possibility of using 0MQ to handle all the nasty details and
     get a simple message-based interface; some general thoughts on
     dealing with IO (Peter Simons thinks that one central IO loop
     plus callbacks is better than spreading IO all over programs);
     and the prospect of a wait-free concurrency model based on vats
     (David Barbour); and the avoidance of asynchronous exceptions
     except as a last resort (David, Joey).

     Joey eventually followed up by writing an [stm-channelize
     package][stm-channelize] that creates an STM layer over a
     network connection.

*    [Exceeding OS limits for simultaneous socket connections][m9] (30 Jan)

     Rob Stewart has a dozen machines sending messages to each other in
     a master/slave setup. The machines are running a Haskell program
     that sends thousands of messages (lazy bytestrings) to a master
     node in short succession.  He sometimes gets an “accept: resource
     exhausted (Too many open files)” message because the master node
     exceeds its limit for simultaneous socket connections.  Is there
     a good way to deal with this problem?  Surely, it's something the
     Haskell web frameworks have had to contend with?

     Marc Weber suggests (A) to limit the number of simultaneous
     connections accepted by controlling the number of threads that can
     run simultaneously (see his mail for a sketch how), and (B) to use
     strict IO on the handle and close it himself, which Donn Cave
     points out will only reduce the size of the problem but not make it
     go away entirely.  Best to stick with option (A).

*    [Memory usage with TVar?][m11] (1 Feb)

     Johan Brinch gets a surprising amount of memory usage from a tiny
     program that just cons'es a bunch of numbers onto a `TVar
     [(Int,Int)]`.  Where is it all coming from? Jake MacArthur points
     us to a StackOverflow posting on the [memory footprint of Haskell
     data types][haskell-mem], apparently nothing STM specific so much
     as the result of having lots of cons cells, boxed Ints, and tuples.

*    [Concurrency strategy for 2 threads and rare events][m13] (8 Feb)

     JP Moresmau is looking for a good strategy to deal with a setup
     where he has one worker thread, and another user input thread
     ocassionally telling things to the worker thread.  JP supposes he
     could just try a `tryTakeMVar`, but the use of `MVar`'s seems a
     bit heavyweight to him, as the second thread is just blocking for
     user input most of the time. What's a good setup?  Erik Hesselink
     suggests maybe a `throwTo`.  Yves Parès wondered what JP means by
     overkill; `MVar`s are both “teeny tiny” and convenient (they are!),
     perfect for the job.  Edward Amsden suggested an `IORef` modified by
     `atomicModifyIORef`, cheaper than an `MVar` and usable when you don't have
     to lock multiple threads.

*    [How to increase performance using concurrency for sequential producer-consumer problem][m16] (13 Feb)

     Roel van Dijk sketches out a program for us: producer produces
     values and feeds them to a callback function (blocking until there
     are values to produce), converter does CPU-only work (and is the
     bottleneck of the program), consumer consumes values (important: in
     the order they were produced). A non-concurrent driver for these
     may look like `producer (consumer . converter)`.  OK now what if
     Roel wants to make use of concurrency, ideally spawing off a worker
     thread for each core in his system?  The important thing here is
     that the consumer continues to get values in the order they come
     out of the producer.

     Some suggestions: conduits and the newly released stm-conduit
     package (Clark Gamari), monad-coroutine (Mario Blažević),
     bounded STM channels (John Lato).  John [points out][m16b] that
     while something like iteratee/conduit-stm would give you
     concurrency between the producer/converter/consumer, but would
     not directly help with the concurrency between converters.

*    [New distributed-process (Cloud Haskell) backend/transport design and prototype][p1]

     Duncan Coutts followed up on his design proposal for the Cloud
     Haskell transport design.  As mentioned in the Parallel GHC
     news above, we've been working on doing some prototyping, and
     now have code for people to look at and play with.  Check out
     the [GitHub page][dp-github] and [design document][dp-design]
     for distributed-process.

*    [Round-robin scheduling and forkIO][p5] (2 Feb)

     Andreas Voellmy wanted to check what happens when you `forkIO` a
     new thread, if it it executed in the same HEC as the original or in
     some new thread. He points to [Runtime support for multicore
     Haskell][rts-multicore], which says that threads within the run
     queue of a single HEC are scheduled in round-robin order; but does
     not talk about the threads as a whole.  This can lead, Andreas
     suggests, to unfair scheduling of threads if you have the situation
     where an HEC has a single always-runnable thread, and another HEC
     has lots of threads (see his follow-up for example code
     illustrating the problem). Simon Marlow both that threads are
     executed in the current HEC and that the described situation can
     happen (“It wouldn't be too hard to do something more sensible
     here”).  Simon suggests looking in rts/Schedule.c, particularly the
     function `schedulePushWork ()`.

*    [Synchronizations in memory allocation?][p6] (11 Feb)

     Andreas has a program with several Haskell threads each
     reading a different TCP sockets.  He's investigating a problem
     where the performance of his program starts to degrade after
     20 cores.  Andreas suspects a scaling problem in allocating memory
     and has written a micro-benchmark to explore the issue.  The
     program forks some worker threads that just allocated some memory
     without trying to synchronise or share any data.  He is surprised
     to find that the benchmark takes longer using
     `mallocPlainForeignPtrBytes` the more threads he forks.  The
     benchmark makes for a good stress test — it allocates over 10 Gb/s
     — and as result of Andreas' investigations, Simon Marlow has
     modified the pinned memory allocator to “steal complete blocks
     from the nursery rather than allocating them from the global block
     allocator.”  The changes will show up in GHC 7.4.2.

     20 cores, eh? [There's no pleasing some people.][no-pleasing]

StackOverflow and Reddit
----------------------------------------------------------------------
* [Installing dph-examples in Mac OS X 10.7.2][s1]
* [Programming language for functional parallelism: F# vs Haskell][s2]
* [Parallel IO Causes Random Text Output in Terminal][s3]
* [Repa Without Parallelization][s4]
* [what exactly is dynamic parallelism?][s5]
* [Writing “fib” to run in parallel: -N2 is slower?][s6]
* [How to use Parallel Strategies in Haskell][s7]
* [Parallel graphics processing in Haskell][s8]
* [Why is concurrent haskell non deterministic while parallel haskell primitives (par and pseq) deterministic?][s9]
* [Runtime performance degradation for C FFI Callback when pthreads are enabled][s10]
* [Memory footprint in threaded mode with forkIO when there is lot of contention for single MVar][s11]

Help and Feedback
----------------------------------------------------------------------
If you'd like to make an announcement in the next Haskell Parallel
Digest, then get in touch with me, Eric Kow, at
<parallel at well-typed.com>. Please feel free to leave any comments and
feedback!

[chan]: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent-Chan.html
[dp-design]: https://github.com/haskell-distributed/distributed-process/wiki/New-backend-and-transport-design
[dp-github]:  https://github.com/haskell-distributed/distributed-process
[dph-ghci]: http://warmfuzzything.posterous.com/loading-dph-codes-in-ghci
[haskell-mem]: http://stackoverflow.com/questions/3254758/memory-footprint-of-haskell-data-types
[mvar-api]:  http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.5.0.0/Control-Concurrent-MVar.html 
[mvar-ezyang]: http://blog.ezyang.com/2011/02/all-about-mvars/
[mvar-rwh]:  http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html
[no-pleasing]: https://plus.google.com/107890464054636586545/posts/VFaU1CNj1z4
[pcp-tutorial]: http://community.haskell.org/~simonmar/par-tutorial.pdf 
[ph2]: http://www.well-typed.com/blog/53
[pp-frp]: http://conal.net/papers/push-pull-frp/
[rts-multicore]: http://community.haskell.org/~simonmar/papers/multicore-ghc.pdf
[stm-channelize]: http://hackage.haskell.org/package/stm-channelize
[stm]: http://www.haskell.org/haskellwiki/STM
[suj-spell]: http://hpaste.org/63732
[ts-tour]: http://haskell.org/haskellwiki/ThreadScope_Tour
[ts-tour]: http://www.haskell.org/haskellwiki/ThreadScope_Tour
[twilight]: http://proglang.informatik.uni-freiburg.de/projects/twilight/
[zmq]: http://www.zeromq.org/

[n1]: http://www.haskell.org/ghc/docs/7.4.1/html/users_guide/release-7-4-1.html
[n2]: http://www.haskell.org/pipermail/haskell/2012-February/023181.html
[n3]: http://www.haskell.org/pipermail/haskell-cafe/2012-January/098604.html
[n4]: http://blogs.cs.st-andrews.ac.uk/csblog/2011/11/17/funded-research-studentships/
[n5]: http://www11.i-grasp.com/fe/tpl_kent01.asp?newms=jj&id=36123&newlang=1

[v1]: http://channel9.msdn.com/Blogs/Charles/YOW-2011-Simon-Peyton-Jones-and-John-Hughes-Its-Raining-Haskell
[v2]: http://yow.eventer.com/events/1004/talks/1055
[v3]: https://www.youtube.com/watch?v=jBd9c1gAqWs
[v4]: http://youtu.be/tDgOBM29ny4

[b1]: http://developer.amd.com/zones/OpenCLZone/publications/assets/MakingOpenCLSimplewithHaskell.pdf
[b2]: http://mainisusuallyafunction.blogspot.com/2011/12/propane-functional-synthesis-of-images.html
[b3]: http://xrl.tureus.com/black-box-reverse-engineering-zmq
[b4]: http://disciple-devel.blogspot.com/2012/02/vectorisation-without-replication-in.html
[b5]: http://tumblr.justtesting.org/post/16916036670/the-n-body-problem-and-vectorisation-of-nested-data
[m14]: http://www.haskell.org/pipermail/haskell-cafe/2012-February/099298.html

[m1]: http://www.haskell.org/pipermail/haskell-cafe/2012-January/098390.html
[m2]: http://www.haskell.org/pipermail/haskell-cafe/2012-January/098428.html
[m3]: http://www.haskell.org/pipermail/haskell-cafe/2012-January/098453.html
[m4]: http://www.haskell.org/pipermail/haskell-cafe/2012-January/098482.html
[m5]: http://www.haskell.org/pipermail/haskell-cafe/2012-January/098495.html
[m6]: http://www.haskell.org/pipermail/haskell-cafe/2012-January/098524.html
[m7]: http://www.haskell.org/pipermail/haskell-cafe/2012-January/098570.html
[m8]: http://www.haskell.org/pipermail/haskell-cafe/2012-January/098705.html
[m9]: http://www.haskell.org/pipermail/haskell-cafe/2012-January/098969.html
[m10]: http://www.haskell.org/pipermail/haskell-cafe/2012-January/098986.html
[m11]: http://www.haskell.org/pipermail/haskell-cafe/2012-February/099035.html
[m12]: http://www.haskell.org/pipermail/haskell-cafe/2012-February/099185.html
[m13]: http://www.haskell.org/pipermail/haskell-cafe/2012-February/099264.html
[m15]: http://www.haskell.org/pipermail/haskell-cafe/2012-February/099302.html
[m16]: http://www.haskell.org/pipermail/haskell-cafe/2012-February/099378.html
[m16b]: http://www.haskell.org/pipermail/haskell-cafe/2012-February/099440.html
[m17]: http://www.haskell.org/pipermail/haskell-cafe/2012-February/099472.html

[p1]: https://groups.google.com/d/msg/parallel-haskell/TJ4PeVoaato/th5pP4sUyjEJ
[p2]: https://groups.google.com/d/msg/parallel-haskell/WkmOAgUAJIE/Wsr7fu7Sm7IJ
[p3]: https://groups.google.com/d/msg/parallel-haskell/Na5-B-9XgZE/bdDVryd7R1YJ
[p4]: https://groups.google.com/d/msg/parallel-haskell/0Ql1wBNMbI8/5h_sQCc1AA4J
[p5]: https://groups.google.com/d/msg/parallel-haskell/rXiu-4E6_AA/BBPOvnej3ywJ
[p6]: https://groups.google.com/d/msg/parallel-haskell/fsHrxz3ei70/NUpu21JQolgJ

[s1]: http://stackoverflow.com/questions/8864696/installing-dph-examples-in-mac-os-x-10-7-2
[s2]: http://stackoverflow.com/questions/5492930/programming-language-for-functional-parallelism-f-vs-haskell
[s3]: http://stackoverflow.com/questions/8647692/parallel-io-causes-random-text-output-in-terminal
[s4]: http://stackoverflow.com/questions/8652363/repa-without-parallelization
[s5]: http://stackoverflow.com/questions/9215284/what-exactly-is-dynamic-parallelism
[s6]: http://stackoverflow.com/questions/9024672/writing-fib-to-run-in-parallel-n2-is-slower
[s7]: http://stackoverflow.com/questions/8852220/how-to-use-parallel-strategies-in-haskell
[s8]: http://stackoverflow.com/questions/8630659/parallel-graphics-processing-in-haskell
[s9]: http://stackoverflow.com/questions/8582580/why-is-concurrent-haskell-non-deterministic-while-parallel-haskell-primitives-p
[s10]: http://stackoverflow.com/questions/8902568/runtime-performance-degradation-for-c-ffi-callback-when-pthreads-are-enabled
[s11]: http://stackoverflow.com/questions/8870648/memory-footprint-in-threaded-mode-with-forkio-when-there-is-lot-of-contention-fo
-- 
Eric Kow <http://erickow.com>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120302/a1db9810/attachment.pgp>