[Haskell-cafe] Parallel Haskell Digest 8
eric at well-typed.com
Fri Mar 2 14:56:03 CET 2012
Parallel Haskell Digest 8
It's time for our next catch-up with the Parallel Haskell community.
Did you have a chance to see Simon Peyton Jones's talk [The Future is
Parallel, and the Future of Parallel is Declarative][v2]? It's a good
survey of some of the directions that parallel Haskell has been taking,
and if you're somewhat new to this stuff, a great feel for the breadth
of the community. You'll get a better idea why you see people in this
digest muttering about arrays, others trying to make channels and
transactions work together, and yet others talking up the 0MQ protocol.
So check it out!
We at Well-Typed are writing the digest as part of our mission
to push Haskell parallelism and concurrency out into the real
world. We are excited about these technologies and we want to do
whatever it takes to make it more accessible to everybody. More
news below on how we're doing in this Parallel GHC project.
HTML version of this digest at http://www.well-typed.com/blog/64
* [GHC 7.4.1 released][n1]
Of particular interest to parallel Haskellers: better profiling
flags, multicore profiling, vastly improved DPH, event logging,
more convenient RTS flags (no more `-rtsopts` needed for common
options like `-N`). See that new event logging stuff? Could be
a good chance to play with the new spark profiling features from
* [Summer School on Functional Programming for Parallel and Concurrent Applications][n2] (2 weeks)
Imagine this: two summer weeks in the Southern France on the Côte
d'Azur, learning about Parallel Haskell. This would be the
CEA-EDF-INRIA Summer School on Functional Programming for
Parallel and Concurrent Applications. It takes place on 11-22
June 2012, Castle of Cadarache, Saint Paul Lez Durance, France.
No functional programming experience needed. The aim of the summer
school is to give a thorough and application-oriented introduction
to functional programming using the programming language Haskell. A
special focus is on parallel and concurrent programming,
highlighting the ways in which features such as strong typing and
purity make it dramatically easier to write reliable parallel or
concurrent code. The school is split into three different courses
that highlight different aspects of functional programming. All
courses consist of lectures and hands-on sessions where everyone
can try out the language on several exercises.
Well-Typed's Andres Löh will be there, as well as Ralf Hinze,
and Simon Marlow. The registration deadline is on 30 May, but
I wouldn't procrastinate if I were you; the course sounds really
* [PhD program at Portland State accepting applications for Fall 2012][n3] (18 Jan)
There's quite a bit of functional programming research at PSU, and
one of the professors (Andrew Black) is a co-author on the Cloud
* [6 Funded Research Studentships at St Andrews][n4]
The School of Computer Science at the University of St Andrews has
six prize PhD studentships available. These studentships are
funded through the Scottish Informatics and Computer Science
Alliance and provide both fees and maintenance (£13,500) per annum.
They can be taken up by students of any nationality (including
non-EU students). The deadline is *today* (1 March), so if you're
interesting, it might not be too late to get in touch with
[Kevin Hammond](mailto:kh at cs.st-andrews.ac.uk).
* [Computer Science Lecturer Job at Kent][n5]
The School of Computing at the University of Kent is seeking to
appoint a high flying academic who will spend the first 3 years in
post developing as an independent researcher of internationally
excellent quality in an area relevant to the School of Computing.
This does not look Parallel Haskell specific, or even FP specific,
but Simon Thompson did tweet that “the functional programmers at
Kent would be particularly keen to recruit someone who would work
Parallel GHC Project Update
ThreadScope 0.2.1 has been released! This version contains the features
we had demonstrated at the Haskell Implementor's Workshop in September
2011. Since our workshop prototype, we have greatly refined the spark
histogram feature, tuning the visualisations so that they are easier to
understand. We've also written a small tutorial to go with the new
release. The [ThreadScope Tour][ts-tour] works through concrete examples
on using ThreadScope to debug the performance of parallel programs.
We'd love feedback you have about the tutorial, especially things you
feel like you need a little more help with.
Along with the new ThreadScope and tutorial, we also have a new version
of the ghc-events package which now provides an encoding of the meanings
of events in state machines. This makes it possible to validate
eventlogs, and doubles as an always up-to-date source of code as
We've made some progress on our work in developing a swappable
transport layer for Cloud Haskell. We now have a prototype
implementation “distributed-process” (intended to be the sucessor to
“remote”, the current Clound Haskell implementation). For more details,
see the [distributed-process GitHub page][dp-github], particularly the
examples and the [design document][dp-design], which incorporates
feedback on our initial proposal.
Finally a bit of partner news to wrap things up:
* Tim (LANL) has been fine-tuning and profiling their physics simulation
tool. One change in particular has been to switch to a “counter-based”
RNG scheme which lends itself nicely to use in parallel or distributed
settings. They are also currently having a look at Haskell-MPI and
* Finlay (Dragonfly) is working on a tricky performance problem,
trying to speed up a program with multiple relatively small
computations and relatively many synchronization points in between.
* Toni (Telefonica) is benchmarking the maximal clique enumeration
algorithm on a large number of randomly generated graphs. He is
getting good speedups for up to 6 cores, with some stagnation around 8
cores. Current research is either finding a graph traversal strategy
that works reasonably well with all graphs, or have one which somehow
adapts to the properties of the graph.
* Kazu (IIJ) has made several new releases of his libraries, and
bug-fixes for a lot of third party libraries. He has also been
participating in the discussion that lead to the development of
Michael Snoyman's conduit library, and switched Mighttpd over to
* Mark (VETT) is learning how to use Cloud Haskell. He is currently
writing some example programs to test his understanding. Hopefully
more from Mark when he's had more time to play with this
Word of the month
Over the next few digests, we'll be switching our focus from parallelism
to concurrency. We tend to stress the distinction because Haskell
offers ways to write parallel programs without making explicit use of
concurrency. Parallelism done right gets us faster programs.
Concurrency on the other hand buys us… concurrency. It's not going
away. If every multicore computer in existence were to vanish, we would
want to solve concurrent problems. Whether the simultaneity is real or
simulated, we would still want to do more than one thing at the same time
– accept user input, display progress messages, serve multiple clients.
So let's dig in! We first got a taste of concurrency in second Parallel
Haskell digest, where we introduced the notion of [threads][ph2]. As an
abstraction, threads give us a way to express the separation of concerns
between different jobs. But this isn't enough. Sometimes we need to
undo the separation just enough to pass information from one thread to
This brings us to our word of the month: `MVar`. The humble `MVar`
(pronounced “em-var”) is one of many solutions for this communication
problem, a fairly low-level one by Haskell standards, but one that is
still useful enough that you'll see it used very frequently. An `MVar`
is like a burri… wait, wrong tutorial. Actually, it is helpful to think
of an `MVar` as a box in the sense that it holds values and can either
be full or empty. The `MVar` type takes a type variable, so an `MVar
Int` might hold an integer , an `MVar String` a String, an `MVar
[String]` a list of strings and so on.
data MVar a
newEmptyMVar :: IO (MVar a)
takeMVar :: MVar a -> IO a
putMVar :: MVar a -> a -> IO ()
To give an idea how this might be used, below is a small program that
fetches some URL in one thread while doing something else in the other.
We fork off a Haskell thread that does the fetching and write to `MVar`
to indicate what we've retrieved. In the main thread, we do our other
work and then just wait until the page has been fetched.
main = do
m <- newEmptyMVar
forkIO $ do
r <- getURL "http://en.wikipedia.org/wiki/Shovel"
putMVar m r
r <- takeMVar m
These `MVar`'s may look a little familiar if you've used `IORef`s in
Haskell. Here is a mini API for comparison:
data IORef a
newIORef :: IO (IORef a)
readIORef :: IORef a -> IO a
writeIORef :: IORef a -> a -> IO ()
So what exactly do `MVar`'s buy us? Why not just use `IORef`s to share
mutable variable across threads? The reason is that coordination between
threads can get messy: we want to make sure that any value we pass from
one thread to another is accounted for (and not accidentally overwritten
before being consumed), that we don't try to consume values that are
somehow out of date with respect to other threads (that updated values
are received instead of an old value being read twice). Suppose we
wanted to fetch a URL while doing something else at the same time. How
do we know when we have successfully retrieved it?
-- don't write this at home!
inadvisableMain = do
m <- newIORef "" -- default value? :-(
forkIO $ do
r <- getURL "http://en.wikipedia.org/wiki/Shovel"
writeIORef m r -- are we overwriting something? :-(
r <- readIORef m -- is there something to read? :-(
In the example above, we have no idea if the page at URL would have been
fetched by the time we try to display its contents. What we are looking
for is a *synchronisation* mechanism. We need a way to
indicate that our shared values are ready to be used. For example, we
might hit upon the idea of combining `IORef` with `Maybe`. Now we have
a some extra bureaucracy to handle. If we read a value and get
`Nothing` we would then know that there isn't yet a value ready to be
read. One way or another we would have to account for this case, for
example busy waiting until we get a `Just`. On the flip side, we want to
make sure that when somebody has written a value, the intended recipient
has read it before we write over it. This sort of bureaucracy would be
best packaged up into helper functions, functions that look awfully like
`takeMVar` and `putMVar` might. Notice the change in name even. Now
we're not just reading, but *taking*, emptying the box to signal that
it's been read; and we're not just writing, but *putting*, only writing
when the box is empty. Throw in a little help from the runtime system
so that we're doing something smarter than busy waiting and we'll have
worked our way back to the `MVar`.
So the `MVar` combines references with locking to provide for
synchronisation between threads. If you're coming from other languages,
this should sound rather familiar. C programmers may have used mutexes
(`MVar ()`) or semaphores (`MVar Int`) to protect shared data. Java
programmers may have used `synchronized` methods and statements to
prevent thread interference and memory inconsistency problems. The
`MVar` is just a slightly nicer Haskell packaging to the same sort of
idea. This means it suffers the same problems as its locky sisters.
Sure, having the locks be implicit and putting them where they count
(the data being shared) makes life a bit simpler, but at the end of
the day locks are still locks. Take them in the wrong order, and you
can still get deadlocks, races and all those subtle hard-to-reproduce
bugs that keep programmers up at night.
So what is the hapless foot-shooting programmer to do? The good news is
that `MVar`'s are only one of several mechanisms for dealing with
concurrency. Safer mechanisms exist, albeit at a cost. `MVar`'s present
a compromise between performance and safety. If you are extra extra
careful, you can on the one hand squeeze some serious performance out of
`atomicallyModifyIORef` for concurrent data structures. If on the other
hand, if you're willing to take a potential performance penalty in
exchange for never worrying about a deadlock again, stay tuned for our
next word of the month, “[transaction][stm]“. For more about `MVar`'s in the
meantime, have a look at Edward Z. Yang's blog for an [MVar
overview][mvar-ezyang] as well as the updated [API
documentation][mvar-api], and finally the [concurrency chapter][mvar-rwh]
from Real World Haskell.
* [It's Raining Haskell][v1] (30 minutes)
Charles from MSDN's Channel 9 had a chance to catch Simon
Peyton Jones and John Hughes for a chat at the YOW conference in
Sydney. As John observed during the interview, “Haskell is
picking up more and more real users now, and that's very pleasing,
but one of the reasons is because it offers a way of getting to
grips with parallelism without suffering all the difficulties that
you do with imperative programming languages, so I think that's a
big opportunity for Haskell. I'd hope it won't just be a source of
ideas for other languages, but that it will actually see a lot of
real use itself.”
* [The Future is Parallel, and the Future of Parallel is Declarative][v2] (1 hour)
Simon wasn't at YOW just to give interviews on park benches.
During his presentation, Simon gives us a sense of what a
parallel future might look like. He skips past the usual
FP-for-multicore propaganda, jumping straight to the prediction
that “in the end, *something* declarative will win”. The bad
news is that “there is no parallelism without tears”. If we want
parallelism, we need to develop parallel algorithms, and
understand the cost models behind the approaches we use. But one
cost model does not fit all problems, so “no silver bullet; embrace
diversity”. Simon then presents some of the different paradigms on
offer, using Haskell as a single linguistic framework, but not
specifically pushing the language. He covers task parallelism (threads,
STM, Cloud Haskell), semi-implicit algorithm (`par` and friends) and data
parallelism (Repa, DPH).
It's worth knowing that the YOW conference is “for developers by
developers” and Simon has given the talk for a non-Haskell audience.
So if you really just want an idea what all the parallel fuss is
about, give his talk a try!
* [Real-time ray tracing demo with Repa ][v3]
As posted by Ben Lippmeier: Real-time ray tracing demo written in Haskell
with a pre-release of the Gloss-Field and Repa 3.0.0 parallel array
libraries. Demo is running in real time in parallel on a four core machine
with two hardware threads per core.
* [Barnes-Hut Gravity Simulation][v4]
More demos? OK! How about this gravity simulation using Gloss? A
naive n-body simulation has quadratic runtime complexity,
O(n^2); but with the Barnes-Hut algorithm, we can approximate one
with only an O(n log n) runtime. These simulations lend
themselves well to parallelism; however, at present (2012-02),
the sequential version of this program (using Data.Vector) still
runs faster than its parallel equivalent (using DPH). The good
news is after working on DPH vectorisation for 6 months, the
asymptotic complexity of the DPH version now matches that of the
sequential one. Hooray! Well OK, so there's still pesky constant
factor involved. The sequential version is still 10 times faster,
but if Ben can improve DPH enough to make the parallel version
faster by August, he's buying everybody beer. See [his blog
post][b4] for more details, as well as [Manuel Chakravarty's
slides][b5] on vectorisation and the shared data problem.
Blogs and packages
* [Making OpenCL™ Simple with Haskell][b1] (Jan 2011)
Benedict R. Gaster published a set of slides (53 slides) presenting
the results of his collaboration with Garrett Morris (AMD intenern
and PSU PhD student). “It's a heterogeneous world” out there, with
a modern platform including some CPUs, GPUs, and more. OpenCL is
a framework for writing parallel programs on this mixture of
devices. Benedict's slides introduce OpenCL, and talk about the
low-level Haskell FFI to OpenCL C, and the use of monads and
Template Haskell quasiquotation so that we can write something
higher-level and more Haskelly. See the slides also for some
comments about Haskell at AMD.
* [Propane: Functional synthesis of images and animations in Haskell][b2]
Following up on his quasicrystal code, Keegan McAllister has
released Propane, a library for functional synthesis of images and
> Propane uses Repa for data-parallel array computations. That means
> it automatically uses multiple CPU cores for rendering, provided
> the program is compiled and run with threads enabled. That said,
> it's not yet been optimized for speed in other ways.
* [Black-Box Reverse Engineering ZMQ][b3]
[ØMQ (ZeroMQ)][zmq] and the ZMTP protocol provide a high performance
messaging layer for concurrent/distributed applications.
Unfortunately, for Xavier Lange, “the ZMQ spec is not fully
cooked”. This he discovered when trying to use libzmq and find tha
the data sent by it doesn't quite match the specification. So
what's a ZMQ implementor to do? In this blog post, Xavier inserts
a packet sniffer between his Haskell code and a pure C libzmq test app
and teases apart the “real” protocol as implemented by the library.
He'll be studying the (LGPL) code next to see if he got the right
idea from his black box debugging efforts.
* [stm-channelize 0.1.1][stm-channelize] (20 Jan)
This package arose from Joey Adam's [frustrations with
asynchronous IO in Haskell][m5] for a networking layer
he was writing. The package turn I/O operations into STM
transactions on channels.
* [stm-conduit-0.2.1][m14] (9 Feb)
Clark Gaebel invites us to check out his stm-conduit package, which
“introduces conduits to the wonderful world of concurrency“:
> My package solves the common problem of constant bottleneck
> switching loaders have. This is when, for example, we stream XML
> from the disk and then parse the XML in one conduit pipeline. While
> it streams a file from the disk, the process is clearly IO bound,
> and while it parses the XML, the process is CPU bound. By putting
> each task on its own thread, the disk IO doesn't need to wait for
> the CPU to parse a document before loading the next file. By using
> stm-based conduits, we have full resource utilization.
* *Help wanted!* [Optimizations and parallel execution in the IO for
a small spellchecker][m17] (15 Feb)
Paul Sujkov wrote a simple [spellchecker][suj-spell] that checks
for words that aren't in a dictionary file. He's used it to
benchmark the various Haskell hash table implementations on
hackage, and now wants to see if he can take things a step further
with parallelisation. He's tried using monad-par and the parallel
packages both, but seems to have performance trouble due to the
IO part of the problem. Any tips for Paul?
* [SMP parallelism increasing GC time dramatically][m1] (9 Jan)
Mikolaj Konarski followed up on a discussion started by Tom Thorne
a while ago. Tom was nagged (in stereo!) to ThreadScope his
performance problems, which he would have done had ThreadScope not
crashed. Ben Gamari thinks he may have experienced something
similar. What about you? Has ThreadScope ever crashed on you?
* Data Parallel Haskell on MacOS X (and GHC 7.4) (15-21 Jan, 9 Feb)
Mukesh Tiwari has been a series of problems getting dph to work on
MacOS X Lion. He has gotten messages like “Couldn't figure out LLVM
version” and “Error running clang!” and more recently, the fact
that `-fdph-par` is no longer recognised by GHC.
Some solutions (Brandon Allbery and Carter Schonwald) seem to be to
learn that the LLVM that comes with your Mac is not a full
distribution, and that you would need to install one (eg. `brew
install llvm` for Homebrew users).
As for GHC 7.4, the folks at UNSW have not had a chance to update
the DPH wiki. In the meantime, try Ben Lippmeier's suggestion to
use `-package dph-lifted-vseg` to select the backend, and also look
at the dph-examples cabal file to see what flags to use when
- [Installing dph-examples in Mac OS X Version 10.7.2][m6]
- [Error in installing dph-examples on Mac OS X 10.7.3][m15] (9 Feb)
- [Compiling dph package with ghc-18.104.22.16811219][m8] (21 Jan)
* [Pruned Sparks in Original Strategies][m7] (17 Jan)
Burak Ekici is trying to parallelise a sampling of programs to see
how first and second generation Strategies compare. In his
understanding, the garbage collection mechanism in this older
version had an issue which prevented sparks from being pruned. Yet
he finds sparks being pruned anyway with the original version. Any
reasons why this might be the case?
* [Speculative Parallelism][m10] (30 Jan)
Burak was also thinking about the idea that second generation
strategies can exploit speculative parallelism where the original
could not. How can he convince himself of this? Does it indicate
speculative parallelism when all and the same number of created
sparks are converted by both strategies?
* [Evaluating parallel computations in order of finishing (approximately)][m12] (6 Feb)
Victor Miller wants a way to evaluate a list of computations in
parallel, returning them in (roughly) the order of completion.
Does something of that sort exist? It does! Ryan Newton added
that this is a common non-deterministic merge operation used in
stream-processing frameworks. You might have an IO action like
`parCompletionOrder :: [a] -> IO [a]`, but one which returns
immediately allowing the list to be lazily consumed. The function
[Control.Concurrent.Chan.getChanContents][chan] should be helpful
in implementing such a function.
* [Merging two sorted list in parallel ( dph )][p2] (29 Jan)
Mukesh Tiwari has two sorted lists of triples that he wants to
merge in parallel. The triples have the form `(i,j,val)`, and
are sorted by their `j` and `i` elements. If both lists have an
element with the same `j`/`i`, we take the sum of their `val`.
Mukesh has implemented half of a merging algorithm involving
binary search, but he's stuck on updating his DPH arrays.
Is there a function that does something like the below?
updateWithFun :: (a -> a -> a) -> [:(Int,a):] -> [:a:] -> [:a:]
* [repa nested sequential warning][p3] (27 Jan)
Rick got some warnings in Repa about “performing nested parallel
computation sequentially“ with the recommendation to use the
`force` function. See the thread for some specific questions
you might have about Repa and forcing arrays. The overall answer
from Ben Lippmeier is to use Repa 3 if possible, which should
resolve the problems Rick was facing and also provide some
less confusing terminology (eg. renaming `force` to `computeP`)
* [how to write this with Control.Concurrent.Strategies ?][p4] (24 Jan)
Johannes Waldmann would like to be able to say to his students
“mergesort already *is* an inherently parallel algorithm” and "you
can execute it as such - you just need to annotate the source
text". He has an implementation written in the `par/pseq` style
but wants to know the “right” way to express it with Strategies.
Patrick Maier posted an attempt which gets similar speedups to
Johannes' original implementation. For comparison, Simon Marlow
added a version using monad-par. Same problem, three techniques!
This posting could be useful if you're just getting started
learning about parallelism.
Also, Chris Brown observed that merge sort is heavily sequential
at the merge point and does not parallelise very well. But
Johannes is only really looking for mild parallelism, enough to
keep 4 to 8 cores busy. For people wanting to get a lot of
paralellism out of merge sort, Chris Brown suggests a technique
using balanced binary trees, and Ryan Newton a parallel
divide-and-conquer (during merge phase), as used in the `cilkmerge`
function in the Cilk sorting code.
* [STM: "nested atomically" error][m3] (12 Jan)
Johan Brinch is getting a “Control.Concurrent.STM.atomically was
nested” error and can't understand where it's coming from. There
is some unsafe IO in there, but only in the form of debug messages.
These innocuous-looking debugs may be the culprit, according to
Andrew Coppin and Brandon Allberry. Their respective guesses are
that (1) something to be evaluated that wouldn't be otherwise
or (2) that something is being lazily evaluated as usual, but
perhaps being forced within a transaction. Brandon adds in
particular that the GHC IO system uses STM internally, which might
explain nested transactions in the presence of unsafePerformIO.
These sorts of shenanigans are normally prevented by the type
system, but all bets are off once you go unsafe.
* [STM atomic blocks in IO functions][m4] (13 Jan)
Rob Stewart would like to know if he would still be guaranteed
thread safety if he had some separate `atomically` transactions in
an IO action, as opposed to one big block. Bryan O'Sullivan
replied that “[i]f you want successive operations to see a
consistent state, they must occur in the same atomically block.”
This confused Ketil Malde. Isn't the idea for state to be
consistent on entry/exit of each block, that breaking a program
into multiple transactions is fine so long as each transaction is
semantically complete unit? Steffen Schuldenzucker offered a
clarification, that the “consistent state” Bryan was talking about
is being sure that no other thread has modified a value during the
span of the block.
Rob had a follow-up question: is it possible to perform IO within
an STM block? In a word, no. Daniel Waterworth explains that it
would not make sense to allow this, because an STM transaction
may be retried, which is typically not what you want to happen to
your IO actions. If it's not possible to do with out,
[TwilightSTM][twilight] or some sort of locking solution with a
TMVar might be options. But usually you can factor the IO out so
that it is not necessary to do within transactions.
* [How to make asynchronous I/O composable and safe?][m5] (14 Jan)
Joey Adams is “not happy with asynchronous I/O in Haskell. It's
hard to reason about, and doesn't compose well”. He was
grappling with several issues simultaneously (hopefully
fodder for the next revision of Simon Marlow's [Concurrent
The setup is that Joey is writing a networking layer an
application. Writing a simple client/server to test this layer
turned out to be horrible on the server side (“an abomination”).
The main problem seems to be that he is forced to use two threads
per client (sending and receiving), because GHC does not provide a
way to simultaneously wait for input and wait for an STM
transaction to succeed. Issues that arise from this are
* synchronizing the threads (e.g. when one dies, kill the other one)
* multiple places where an exception can arise (which can lead to
problems like exceptions interrupting atomic operations)
* sharing a connection handle between multiple threads
(you may recognise this problem if you've seen garbled/interleaved
output from your concurrent code, eg. “AnOonteh esre nsteenntceen.c
Some of the discussion that came up went over using channels to
ensure the use of whole messages only (avoiding the issue with
exceptions) and help “separate [the] application thread from the
complexities of connection management“ (Bardur Arantsson, Daniel
Waterworth); the observation that a pair of input/output channels
is essential a stream processor arrow (Peter Simons); the
possibility of using 0MQ to handle all the nasty details and
get a simple message-based interface; some general thoughts on
dealing with IO (Peter Simons thinks that one central IO loop
plus callbacks is better than spreading IO all over programs);
and the prospect of a wait-free concurrency model based on vats
(David Barbour); and the avoidance of asynchronous exceptions
except as a last resort (David, Joey).
Joey eventually followed up by writing an [stm-channelize
package][stm-channelize] that creates an STM layer over a
* [Exceeding OS limits for simultaneous socket connections][m9] (30 Jan)
Rob Stewart has a dozen machines sending messages to each other in
a master/slave setup. The machines are running a Haskell program
that sends thousands of messages (lazy bytestrings) to a master
node in short succession. He sometimes gets an “accept: resource
exhausted (Too many open files)” message because the master node
exceeds its limit for simultaneous socket connections. Is there
a good way to deal with this problem? Surely, it's something the
Haskell web frameworks have had to contend with?
Marc Weber suggests (A) to limit the number of simultaneous
connections accepted by controlling the number of threads that can
run simultaneously (see his mail for a sketch how), and (B) to use
strict IO on the handle and close it himself, which Donn Cave
points out will only reduce the size of the problem but not make it
go away entirely. Best to stick with option (A).
* [Memory usage with TVar?][m11] (1 Feb)
Johan Brinch gets a surprising amount of memory usage from a tiny
program that just cons'es a bunch of numbers onto a `TVar
[(Int,Int)]`. Where is it all coming from? Jake MacArthur points
us to a StackOverflow posting on the [memory footprint of Haskell
data types][haskell-mem], apparently nothing STM specific so much
as the result of having lots of cons cells, boxed Ints, and tuples.
* [Concurrency strategy for 2 threads and rare events][m13] (8 Feb)
JP Moresmau is looking for a good strategy to deal with a setup
where he has one worker thread, and another user input thread
ocassionally telling things to the worker thread. JP supposes he
could just try a `tryTakeMVar`, but the use of `MVar`'s seems a
bit heavyweight to him, as the second thread is just blocking for
user input most of the time. What's a good setup? Erik Hesselink
suggests maybe a `throwTo`. Yves Parès wondered what JP means by
overkill; `MVar`s are both “teeny tiny” and convenient (they are!),
perfect for the job. Edward Amsden suggested an `IORef` modified by
`atomicModifyIORef`, cheaper than an `MVar` and usable when you don't have
to lock multiple threads.
* [How to increase performance using concurrency for sequential producer-consumer problem][m16] (13 Feb)
Roel van Dijk sketches out a program for us: producer produces
values and feeds them to a callback function (blocking until there
are values to produce), converter does CPU-only work (and is the
bottleneck of the program), consumer consumes values (important: in
the order they were produced). A non-concurrent driver for these
may look like `producer (consumer . converter)`. OK now what if
Roel wants to make use of concurrency, ideally spawing off a worker
thread for each core in his system? The important thing here is
that the consumer continues to get values in the order they come
out of the producer.
Some suggestions: conduits and the newly released stm-conduit
package (Clark Gamari), monad-coroutine (Mario Blažević),
bounded STM channels (John Lato). John [points out][m16b] that
while something like iteratee/conduit-stm would give you
concurrency between the producer/converter/consumer, but would
not directly help with the concurrency between converters.
* [New distributed-process (Cloud Haskell) backend/transport design and prototype][p1]
Duncan Coutts followed up on his design proposal for the Cloud
Haskell transport design. As mentioned in the Parallel GHC
news above, we've been working on doing some prototyping, and
now have code for people to look at and play with. Check out
the [GitHub page][dp-github] and [design document][dp-design]
* [Round-robin scheduling and forkIO][p5] (2 Feb)
Andreas Voellmy wanted to check what happens when you `forkIO` a
new thread, if it it executed in the same HEC as the original or in
some new thread. He points to [Runtime support for multicore
Haskell][rts-multicore], which says that threads within the run
queue of a single HEC are scheduled in round-robin order; but does
not talk about the threads as a whole. This can lead, Andreas
suggests, to unfair scheduling of threads if you have the situation
where an HEC has a single always-runnable thread, and another HEC
has lots of threads (see his follow-up for example code
illustrating the problem). Simon Marlow both that threads are
executed in the current HEC and that the described situation can
happen (“It wouldn't be too hard to do something more sensible
here”). Simon suggests looking in rts/Schedule.c, particularly the
function `schedulePushWork ()`.
* [Synchronizations in memory allocation?][p6] (11 Feb)
Andreas has a program with several Haskell threads each
reading a different TCP sockets. He's investigating a problem
where the performance of his program starts to degrade after
20 cores. Andreas suspects a scaling problem in allocating memory
and has written a micro-benchmark to explore the issue. The
program forks some worker threads that just allocated some memory
without trying to synchronise or share any data. He is surprised
to find that the benchmark takes longer using
`mallocPlainForeignPtrBytes` the more threads he forks. The
benchmark makes for a good stress test — it allocates over 10 Gb/s
— and as result of Andreas' investigations, Simon Marlow has
modified the pinned memory allocator to “steal complete blocks
from the nursery rather than allocating them from the global block
allocator.” The changes will show up in GHC 7.4.2.
20 cores, eh? [There's no pleasing some people.][no-pleasing]
StackOverflow and Reddit
* [Installing dph-examples in Mac OS X 10.7.2][s1]
* [Programming language for functional parallelism: F# vs Haskell][s2]
* [Parallel IO Causes Random Text Output in Terminal][s3]
* [Repa Without Parallelization][s4]
* [what exactly is dynamic parallelism?][s5]
* [Writing “fib” to run in parallel: -N2 is slower?][s6]
* [How to use Parallel Strategies in Haskell][s7]
* [Parallel graphics processing in Haskell][s8]
* [Why is concurrent haskell non deterministic while parallel haskell primitives (par and pseq) deterministic?][s9]
* [Runtime performance degradation for C FFI Callback when pthreads are enabled][s10]
* [Memory footprint in threaded mode with forkIO when there is lot of contention for single MVar][s11]
Help and Feedback
If you'd like to make an announcement in the next Haskell Parallel
Digest, then get in touch with me, Eric Kow, at
<parallel at well-typed.com>. Please feel free to leave any comments and
Eric Kow <http://erickow.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
More information about the Haskell-Cafe