[Haskell] thread-local variables

Tue Aug 8 09:21:06 EDT 2006

On 07.08 13:16, Frederik Eaton wrote:
> > How would this work together with the FFI?
> 
> It wouldn't, at least I wouldn't care if it didn't.
>

Suddenly breaking libraries that happen to use FFI behind your
back does not seem like a good conservative extension.

I think we should move the discussion to the wiki as Simon
suggested. I can create a wikipage if you don't want to.

> What about my example:
> 
> newMain host environment program_args
>     network_config locale terminal_settings
>     stdin stdout stderr = do
>     ...
> 
> Now, let's see. We might want two threads to have the same network
> configuration, but a different view of the filesystem; or the same
> view of the filesystem, but a different set of environment variables;
> or the same environment, but different command line arguments. All
> three cases are pretty common in practice. We might also want to have
> the same arguments but different IO handles - as in a multi-threaded
> server application.

This won't be pretty even with TLS. Our fancy app will probably mix
in STM and pass callback actions to the thread processing
packets coming directly from the network interface. Quickly
the TLS approach seems problematic - we need to know what actions
depend on each other and how.

> And the part that implements the filesystem might want to access the
> network (if there is a network filesystem). And the part that starts
> processes with an environment might want to access the filesystem, for
> instance to read the code for the process and for shared libraries;
> and maybe it also wants to get the hostname from the network layer. 
> And the part that starts programs with arguments might want to access
> the environment (for instance, to get the current locale), as well as
> the filesystem (for instance, to read locale configuration files). And
> the part that accesses the IO handles might also want to access not
> just the program arguments but the environment, and the filesystem,
> and the network.

So we have the following dependencies:

FileSystem  -> Network
Environment -> FileSystem, Network
Arguments   -> Environment
IO Handles  -> Arguments,Environment,FS,Network

With TLS every one of them has type IO. Now the programmer is supposed
to know that he has to configure the network before using program
arguments? So a programmer first wanting to process command line
arguments and only then configuring network will probably have
hidden bugs.

It becomes very hard to know what different components depend on.

Even if we had to define all those instances that would be
1+2+1+3 = 7 instance declarations. Not 5^2 = 25 instances.
Or use small wrapper combinators (which I prefer).

btw how would the TLS solution elegantly handle that I'd like
separate network configurations for e.g.
IO Handle -> Network(socket) and
IO Handle -> FileSystem(NFS) -> Network
?

> So here is an example where we have nested layers, and each layer
> accesses most of the layers below it.

And this will cause problems. A good API should not encourage
going to the lower levels directly. If the lowest level changes
then with your design one has to make O(layers) changes instead of
O(1) if the layers are not available directly.

If one of the layers adds a new dependency then making sure it is
initialized and used correctly seems very hard to check.

> If we started with a library that dealt with OS devices such as the
> network, and used a special monad for that; and then if we built upon
> that a layer for keeping track of environment variables, with another
> monad; and then a layer for invoking executables with arguments; and
> then a layer for IO; all with monads - then we would have a good
> modular, extensible design, which, due to the interactions between
> layers, would, in Haskell, require code length which is quadratic in
> the number of layers.

The trick here is that most components should not talk with each
other. Composition and encapsulation are the keys to victory.

> (Of course, it's true that in real operating systems, each of these
> layers has its own set of interfaces to the other layers - so the
> monadic approach is actually not more verbose. But the point is that
> it's a reasonable design, with layers, and where each layer uses each
> of the ones below it. I want to write code which is designed the same
> way, but without the overhead)

Yes, the size of the code is dependent on the size of the API.
Making things explicit is more infrastructure at the start,
but makes things easier later on when they have to be changed.

> If you move it somewhere else, but forget to move the thread-local
> variables it refers to, then you'll get a compiler error.

I was meaning forgetting to initialize it - not omitting the whole
definition.

> db2 <- getIOParam db2Param
> withIOParam dbParam db2 $ ...

And one needs to make sure that the "..." part does not need the
other database connection(s). Makes composing things hard.

> I'm still not sure I understand why thread pools are necessary, by the
> way. I thought forking was pretty fast under GHC.

Threads are quite cheap. But with using a pool we can guarantee things
about the number of threads and don't run to situations with 10000
extra threads just because forking always is fun. The other point
is to use a background thread which talks to blocking C API and
executed callbacks upon receiving events from the C side.

- Einar Karttunen