[Haskell] select(2) or poll(2)-like function?

Mon Apr 18 13:55:57 CEST 2011

On Mon, 18 Apr 2011 12:56:39 +0200
Ertugrul Soeylemez <es at ertes.de> wrote:
> Mike Meyer <mwm at mired.org> wrote:
> > On Mon, 18 Apr 2011 11:07:58 +0200
> > Johan Tibell <johan.tibell at gmail.com> wrote:
> > > On Mon, Apr 18, 2011 at 9:13 AM, Mike Meyer <mwm at mired.org> wrote:
> > > > I always looked at it the other way 'round: threading is a hack to
> > > > deal with system inadequacies like poor shared memory performance
> > > > or an inability to get events from critical file types.
> > > >
> > > > Real processes and event-driven programming provide a more robust,
> > > > understandable and scalable solutions.
> > > > <end rant>
> > >
> > > We need to keep two things separate: threads as a way to achieve
> > > concurrency and as a way to achieve parallelism [1].
> >
> > Absolutely. Especially because you shouldn't have to deal with
> > concurrency if all you want is parallelism. Your reference [1] covers
> > why this is the case quite nicely (and is essentially the argument for
> > "understandable" in my claim above).
> 
> You also don't need Emacs/Vim, if all you want is to write a simple
> plain text file.  There is nothing wrong with concurrency, because you
> are confusing the high level model with the low level implementation.
> Concurrency is nothing but a design pattern, and GHC shows that a high
> level design pattern can be mapped to efficient low level code.

Possibly true. The question is - can it be mapped to a design that's
as robust and scalable as the ones I'm used to working on?

> In Haskell you should not use explicit, manual OS threading/forking for
> the same reason you shouldn't write machine code manually.

That's a good thing - providing it doesn't compromise robustness and
scalability.

> > > It's useful to use non-determinism (i.e. concurrency) to model a
> > > server processing multiple requests. Since requests are independent
> > > and shouldn't impact each other we'd like to model them as
> > > such. This implies some level of concurrency (whether using threads
> > > and processes).
> >
> > But because the requests are independent, you don't need concurrency
> > in this case - parallelism is sufficient.
> Perhaps Haskell is the wrong language for you.  How about programming in
> C/C++?  I think you want more control over low level resources than
> Haskell gives you.  But I suggest having a closer look at concurrency.

Personally, I don't want to have to worry about low-level resources,
or even concurrency. Having to do so feels to much like having to
explicitly allocate and free memory, or worry about register
allocations. But if I have to do those things to get robustness and
scalability until the languages start being able to deal with it, then
I need the RTS to get out of the way and let me do my job.

If I'm using a value that needs protection from concurrent access
without providing that protection, I want the system give me an
error. At run-time is acceptable, but compile time is better. I want
the system to make sure the concurrent protection mechanisms work
properly - no deadlocks, no stuck process, etc - without my having to
do anything but indicate which values need such protection.

> > The unix process model works quite well. Compared to a threaded model,
> > this is more robust (if a process breaks, you can kill and restart it
> > without affecting other processes, whereas if a thread breaks,
> > restarting the process and all the threads in it is the only safe
> > option) and scalable (you're already doing ipc, so moving processes
> > onto more systems is easy, and trivial if you design for it). The
> > events handled by a single process are simple enough that your
> > callback/event spaghetti can line up in nice, straight strands.
> When writing concurrent code you don't care about how the RTS maps it to
> processes and threads.  GHC chose threads, probably because they are
> faster to create/kill and consume less memory.  But this is an
> implementation detail the Haskell developer should not have to worry
> about.

So - what happens when a thread fails for some reason? I'm used to
dealing with systems that run 7x24 for weeks or even months on
end. Hardware hiccups, network failures, bogus input, hung clients,
etc. are all just facts of life. I need the system to keep running
properly in the face of all those, and I need them to disrupt the
world as little as possible.

Given that the RTS has taken control over this stuff, I sort of expect
it to take care of noticing a dead process and restarting it as
well. All of which is fine by me.

> > > We don't need to do this. We can keep a concurrent programming model
> > > and get the execution efficiency of an event driven model. This is
> > > what GHC's I/O manager achieves. On top of that we also get
> > > parallelism for free. Another way to look at it is that GHC provides
> > > the scheduler (using a thread for the event loop and a separate
> > > worker pool) that you end up writing manually in event driven
> > > frameworks.
> >
> > So my question is - can I still get the robustness/scalability
> > features I get from the unix process model using haskell? In
> > particular, it seems like ghc starts threads I don't ask it to, and
> > using both threads & forks for parallelism causes even more headaches
> > than concurrency (at least on unix & unix-like systems), so just
> > replicating the process model won't work well. Do any of the haskell
> > parallel processing tools work across multiple systems?
> 
> Effectively no (unless you want to use the terribly outdated GPH
> project), but that's a shortcoming of the current RTS, not of the design
> patterns you use in Haskell.  By design Haskell programs are well suited
> for an auto-distributing RTS.  It's just that no such RTS exists for
> recent versions of the common compilers.

So is anyone working on such a package for haskell? I know clojure's
got some people working on making STM work in a distributed
environment, but that's outside the goals of the core team.

> In other words:  Robustness and scalability should not be your business
> in Haskell.  You should concentrate on understanding and using the
> concurrency concept well.  And just to encourage you:  I write
> productive concurrent servers in Haskell, which scale very well and
> probably better than an equivalent C implementation would.  Reason:  A
> Haskell thread is not mapped to an operating system thread (unless you
> used forkOS).  When it is advantageous, the RTS can well decide to let
> another OS thread continue a running Haskell thread.  That way the
> active OS threads are always utilized as efficiently as possible.  It
> would be a pain to get something like that with explicit threading and
> even more, when using processes.

Well, *someone* has to worry about robustness and scalability. Users
notice when their two minute system builds start taking four minutes
(and will be at my door wanting me to fix it) because something didn't
scale fast enough, or have to be run more than once because a failing
component build wasn't restarted properly. I'm willing to believe that
haskell lets you write more scalable code than C, but C's tools for
handling concurrency suck, so that should be true in any language
where someone actually thought about dealing with concurrency beyond
locks and protected methods. The problem is, the only language I've
found where that's true that *also* has reasonable tools to deal with
scaling beyond a single system is Eiffel (which apparently abstracts
things even further than haskell - details like how concurrency is
achieved or how many concurrent operations you can have are configured
when you start an application, *not* when writing it). Unfortunately,
Eiffel has other problems that make it undesirable.

> That's why the RTS lets you choose the number of OS threads only instead
> of giving you low level control over the threads.  It spawns as many
> threads as you ask it to spawn and manages them with its own strategy.
> The only way to manipulate this strategy is by deciding whether a
> particular Haskell thread is bound (forkOS) or not (forkIO).

Does the programmer have to worry about such trivia as the number of
threads to use?

    <mike
-- 
Mike Meyer <mwm at mired.org>		http://www.mired.org/consulting.html
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org