[Haskell] select(2) or poll(2)-like function?

Mon Apr 18 14:10:29 CEST 2011

On 18/04/2011 12:55, Mike Meyer wrote:
> On Mon, 18 Apr 2011 12:56:39 +0200
> Ertugrul Soeylemez<es at ertes.de>  wrote:
>> Mike Meyer<mwm at mired.org>  wrote:
>>> The unix process model works quite well. Compared to a threaded model,
>>> this is more robust (if a process breaks, you can kill and restart it
>>> without affecting other processes, whereas if a thread breaks,
>>> restarting the process and all the threads in it is the only safe
>>> option) and scalable (you're already doing ipc, so moving processes
>>> onto more systems is easy, and trivial if you design for it). The
>>> events handled by a single process are simple enough that your
>>> callback/event spaghetti can line up in nice, straight strands.
>> When writing concurrent code you don't care about how the RTS maps it to
>> processes and threads.  GHC chose threads, probably because they are
>> faster to create/kill and consume less memory.  But this is an
>> implementation detail the Haskell developer should not have to worry
>> about.
>
> So - what happens when a thread fails for some reason? I'm used to
> dealing with systems that run 7x24 for weeks or even months on
> end. Hardware hiccups, network failures, bogus input, hung clients,
> etc. are all just facts of life. I need the system to keep running
> properly in the face of all those, and I need them to disrupt the
> world as little as possible.
>
> Given that the RTS has taken control over this stuff, I sort of expect
> it to take care of noticing a dead process and restarting it as
> well. All of which is fine by me.

The RTS can't manage things at that level, because it doesn't know what 
robustness model you want.  So failures in the I/O library results in 
exceptions, and you get to decide what to do.  If a thread dies due to 
an exception, then you are responsible for what happens from then on - 
typically you would have a top-level exception handler that notifies 
some higher-level thread what happened.  It's true that Haskell doesn't 
give you as much help here as you would get in Erlang/OTP, but it's all 
readily programmed up.

Haskell *does* give you some important guarantees though.  Threads never 
just die without receiving an exception first.  If a thread blocks on an 
unreachable resource then it gets an exception, so you get some help 
dealing with deadlocks.

>>>> We don't need to do this. We can keep a concurrent programming model
>>>> and get the execution efficiency of an event driven model. This is
>>>> what GHC's I/O manager achieves. On top of that we also get
>>>> parallelism for free. Another way to look at it is that GHC provides
>>>> the scheduler (using a thread for the event loop and a separate
>>>> worker pool) that you end up writing manually in event driven
>>>> frameworks.
>>>
>>> So my question is - can I still get the robustness/scalability
>>> features I get from the unix process model using haskell? In
>>> particular, it seems like ghc starts threads I don't ask it to, and
>>> using both threads&  forks for parallelism causes even more headaches
>>> than concurrency (at least on unix&  unix-like systems), so just
>>> replicating the process model won't work well. Do any of the haskell
>>> parallel processing tools work across multiple systems?
>>
>> Effectively no (unless you want to use the terribly outdated GPH
>> project), but that's a shortcoming of the current RTS, not of the design
>> patterns you use in Haskell.  By design Haskell programs are well suited
>> for an auto-distributing RTS.  It's just that no such RTS exists for
>> recent versions of the common compilers.
>
> So is anyone working on such a package for haskell? I know clojure's
> got some people working on making STM work in a distributed
> environment, but that's outside the goals of the core team.

Take a look at "Haskell for the Cloud", Jeff Epstein, Andrew Black and 
Simon Petyon Jones:

http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/remote.pdf

>> In other words:  Robustness and scalability should not be your business
>> in Haskell.  You should concentrate on understanding and using the
>> concurrency concept well.  And just to encourage you:  I write
>> productive concurrent servers in Haskell, which scale very well and
>> probably better than an equivalent C implementation would.  Reason:  A
>> Haskell thread is not mapped to an operating system thread (unless you
>> used forkOS).  When it is advantageous, the RTS can well decide to let
>> another OS thread continue a running Haskell thread.  That way the
>> active OS threads are always utilized as efficiently as possible.  It
>> would be a pain to get something like that with explicit threading and
>> even more, when using processes.
>
> Well, *someone* has to worry about robustness and scalability. Users
> notice when their two minute system builds start taking four minutes
> (and will be at my door wanting me to fix it) because something didn't
> scale fast enough, or have to be run more than once because a failing
> component build wasn't restarted properly. I'm willing to believe that
> haskell lets you write more scalable code than C, but C's tools for
> handling concurrency suck, so that should be true in any language
> where someone actually thought about dealing with concurrency beyond
> locks and protected methods. The problem is, the only language I've
> found where that's true that *also* has reasonable tools to deal with
> scaling beyond a single system is Eiffel (which apparently abstracts
> things even further than haskell - details like how concurrency is
> achieved or how many concurrent operations you can have are configured
> when you start an application, *not* when writing it). Unfortunately,
> Eiffel has other problems that make it undesirable.

I'm interested in understanding what problems you're referring to.  What 
kind of scaling are you interested in - number of clients, number of 
cores, or something else?  What is it about Haskell threads that you are 
worried might not scale?

Cheers,
	Simon