[Haskell] select(2) or poll(2)-like function?

Ertugrul Soeylemez es at ertes.de
Mon Apr 18 12:56:39 CEST 2011


Mike Meyer <mwm at mired.org> wrote:

> On Mon, 18 Apr 2011 11:07:58 +0200
> Johan Tibell <johan.tibell at gmail.com> wrote:
> > On Mon, Apr 18, 2011 at 9:13 AM, Mike Meyer <mwm at mired.org> wrote:
> > > I always looked at it the other way 'round: threading is a hack to
> > > deal with system inadequacies like poor shared memory performance
> > > or an inability to get events from critical file types.
> > >
> > > Real processes and event-driven programming provide a more robust,
> > > understandable and scalable solutions.
> > > <end rant>
> >
> > We need to keep two things separate: threads as a way to achieve
> > concurrency and as a way to achieve parallelism [1].
>
> Absolutely. Especially because you shouldn't have to deal with
> concurrency if all you want is parallelism. Your reference [1] covers
> why this is the case quite nicely (and is essentially the argument for
> "understandable" in my claim above).

You also don't need Emacs/Vim, if all you want is to write a simple
plain text file.  There is nothing wrong with concurrency, because you
are confusing the high level model with the low level implementation.
Concurrency is nothing but a design pattern, and GHC shows that a high
level design pattern can be mapped to efficient low level code.

Middle/high level languages follow the philosophy that the compiler is
smarter than you, when it comes to generating machine code.  Haskell
takes this to its conclusion:  The compiler and RTS are smarter than
you, when it comes to generating machine code and managing system
resources, including file descriptors and threads.

In Haskell you should not use explicit, manual OS threading/forking for
the same reason you shouldn't write machine code manually.


> > It's useful to use non-determinism (i.e. concurrency) to model a
> > server processing multiple requests. Since requests are independent
> > and shouldn't impact each other we'd like to model them as
> > such. This implies some level of concurrency (whether using threads
> > and processes).
>
> But because the requests are independent, you don't need concurrency
> in this case - parallelism is sufficient.

Perhaps Haskell is the wrong language for you.  How about programming in
C/C++?  I think you want more control over low level resources than
Haskell gives you.  But I suggest having a closer look at concurrency.


> The unix process model works quite well. Compared to a threaded model,
> this is more robust (if a process breaks, you can kill and restart it
> without affecting other processes, whereas if a thread breaks,
> restarting the process and all the threads in it is the only safe
> option) and scalable (you're already doing ipc, so moving processes
> onto more systems is easy, and trivial if you design for it). The
> events handled by a single process are simple enough that your
> callback/event spaghetti can line up in nice, straight strands.

When writing concurrent code you don't care about how the RTS maps it to
processes and threads.  GHC chose threads, probably because they are
faster to create/kill and consume less memory.  But this is an
implementation detail the Haskell developer should not have to worry
about.


> > We don't need to do this. We can keep a concurrent programming model
> > and get the execution efficiency of an event driven model. This is
> > what GHC's I/O manager achieves. On top of that we also get
> > parallelism for free. Another way to look at it is that GHC provides
> > the scheduler (using a thread for the event loop and a separate
> > worker pool) that you end up writing manually in event driven
> > frameworks.
>
> So my question is - can I still get the robustness/scalability
> features I get from the unix process model using haskell? In
> particular, it seems like ghc starts threads I don't ask it to, and
> using both threads & forks for parallelism causes even more headaches
> than concurrency (at least on unix & unix-like systems), so just
> replicating the process model won't work well. Do any of the haskell
> parallel processing tools work across multiple systems?

Effectively no (unless you want to use the terribly outdated GPH
project), but that's a shortcoming of the current RTS, not of the design
patterns you use in Haskell.  By design Haskell programs are well suited
for an auto-distributing RTS.  It's just that no such RTS exists for
recent versions of the common compilers.

In other words:  Robustness and scalability should not be your business
in Haskell.  You should concentrate on understanding and using the
concurrency concept well.  And just to encourage you:  I write
productive concurrent servers in Haskell, which scale very well and
probably better than an equivalent C implementation would.  Reason:  A
Haskell thread is not mapped to an operating system thread (unless you
used forkOS).  When it is advantageous, the RTS can well decide to let
another OS thread continue a running Haskell thread.  That way the
active OS threads are always utilized as efficiently as possible.  It
would be a pain to get something like that with explicit threading and
even more, when using processes.

That's why the RTS lets you choose the number of OS threads only instead
of giving you low level control over the threads.  It spawns as many
threads as you ask it to spawn and manages them with its own strategy.
The only way to manipulate this strategy is by deciding whether a
particular Haskell thread is bound (forkOS) or not (forkIO).

So again:  This is an implementation detail.  Don't worry about that.
Concurrency is a design pattern only, and you should use it to your
advantage.


Greets,
Ertugrul


-- 
nightmare = unsafePerformIO (getWrongWife >>= sex)
http://ertes.de/





More information about the Haskell mailing list