[Haskell-beginners] Parallel Processing in Libraries?
Sébastien Leblanc
seb at sebleblanc.net
Sun Jan 19 20:42:51 UTC 2020
> I'd like to have a library which utilized parallel programming (mostly for map-reduce tasks).
Since parallel computation is a complex topic, there are many solutions
that might not apply to a problem. I find that often, the simplest,
often overlooked method of implementing parallel computation is by
breaking down a problem into chunks that can easily be computed by a
single core.
Multi-threading is another topic entirely, and while it is often related
to parallelization, it is also often used only to allocate more cores
(improve performance) rather than implement an algorithm that is
otherwise serial, but then you get to have to deal with all the
difficulty of multi-threading like synchronization, shared memory
access, thread-safe code, having to deal with errors potentially
affecting the whole process vs. a single worker.
Thus, once you break down the problem into individual components, one
way to implement parallelization is by using a message queue task list
system. The principle is pretty simple, using a message broker such as
RabbitMQ or ZeroMQ, workers connect to this message queue and listen to
messages from a controller telling them to process some data using some
function. First come, first served. Once the worker is done, the reply
is sent back to the message queue or it is made available by any other
means (for example in a folder with shared access). The worker could
then notify other workers to further process this data.
The elegance of this system lies in how flexible the actual hardware
architecture that processes the load can be. For example, using some
cloud provider, you can automatically spawn more VMs to process a higher
load of data, and discard the VMs once they get idle long enough. On a
single machine, you can fire up one or as many processes as you have
computing cores (if your workload is almost 100% CPU-bound) and let the
OS take care of scheduling the tasks.
I am not aware of a complete task processing library for Haskell; for an
example of a mature project that provides such features, check out
Celery on Python. If you cannot find a substitute, I suppose you could
always use Celery and run your Haskell code from inside the Python
interpreter.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pEpkey.asc
Type: application/pgp-keys
Size: 2464 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20200119/be5e69e2/attachment.key>
More information about the Beginners
mailing list