[Haskell] thread-local variables

Tue Aug 8 10:34:54 EDT 2006

On Tue, Aug 08, 2006 at 04:21:06PM +0300, Einar Karttunen wrote:
> On 07.08 13:16, Frederik Eaton wrote:
> > > How would this work together with the FFI?
> > 
> > It wouldn't, at least I wouldn't care if it didn't.
> 
> Suddenly breaking libraries that happen to use FFI behind your
> back does not seem like a good conservative extension.

FFI already doesn't mix well with GHC's IO handles. What if I write to
file descriptor 1 before all data in stdout has been flushed? Is that
a reason not to allow FFI?

> I think we should move the discussion to the wiki as Simon
> suggested. I can create a wikipage if you don't want to.

http://haskell.org/haskellwiki/Thread_local_storage

I think the wiki is a good place for proposals, but not most
discussion.

> > What about my example:
> > 
> > newMain host environment program_args
> >     network_config locale terminal_settings
> >     stdin stdout stderr = do
> >     ...
> > 
> > Now, let's see. We might want two threads to have the same network
> > configuration, but a different view of the filesystem; or the same
> > view of the filesystem, but a different set of environment variables;
> > or the same environment, but different command line arguments. All
> > three cases are pretty common in practice. We might also want to have
> > the same arguments but different IO handles - as in a multi-threaded
> > server application.
> 
> This won't be pretty even with TLS. Our fancy app will probably mix
> in STM and pass callback actions to the thread processing
> packets coming directly from the network interface. Quickly
> the TLS approach seems problematic - we need to know what actions
> depend on each other and how.

I don't understand. Does TLS make such design harder or easier?

> > And the part that implements the filesystem might want to access the
> > network (if there is a network filesystem). And the part that starts
> > processes with an environment might want to access the filesystem, for
> > instance to read the code for the process and for shared libraries;
> > and maybe it also wants to get the hostname from the network layer. 
> > And the part that starts programs with arguments might want to access
> > the environment (for instance, to get the current locale), as well as
> > the filesystem (for instance, to read locale configuration files). And
> > the part that accesses the IO handles might also want to access not
> > just the program arguments but the environment, and the filesystem,
> > and the network.
> 
> So we have the following dependencies:
> 
> 
> FileSystem  -> Network
> Environment -> FileSystem, Network
> Arguments   -> Environment
and Filesystem
> IO Handles  -> Arguments,Environment,FS,Network
> 
> With TLS every one of them has type IO. Now the programmer is supposed
> to know that he has to configure the network before using program
> arguments? So a programmer first wanting to process command line
> arguments and only then configuring network will probably have
> hidden bugs.

The running example is an example of an executable starting in an
operating system. So everything is already configured by the time it
starts, as you know.

My application will be no different - for instance, the
database-related parameter will be set; then a request thread will
start, and after parsing the request, a user-id parameter will be set,
and then the request-processing functions will be called. There is no
reason for the main server thread to call any of the
request-processing functions, because it doesn't have a request to
process.

> It becomes very hard to know what different components depend on.
> 
> Even if we had to define all those instances that would be
> 1+2+1+3 = 7 instance declarations. Not 5^2 = 25 instances.
> Or use small wrapper combinators (which I prefer).

O(x) doesn't mean "same as x".

> btw how would the TLS solution elegantly handle that I'd like
> separate network configurations for e.g.
> IO Handle -> Network(socket) and
> IO Handle -> FileSystem(NFS) -> Network
> ?

The filesystem could send its actions to be executed in a separate
thread, which has its own configuration?

> > So here is an example where we have nested layers, and each layer
> > accesses most of the layers below it.
> 
> And this will cause problems. A good API should not encourage
> going to the lower levels directly. If the lowest level changes
> then with your design one has to make O(layers) changes instead of
> O(1) if the layers are not available directly.

No, you just write a compatibility wrapper over the new
implementation.

> If one of the layers adds a new dependency then making sure it is
> initialized and used correctly seems very hard to check.

I disagree.

> > If we started with a library that dealt with OS devices such as the
> > network, and used a special monad for that; and then if we built upon
> > that a layer for keeping track of environment variables, with another
> > monad; and then a layer for invoking executables with arguments; and
> > then a layer for IO; all with monads - then we would have a good
> > modular, extensible design, which, due to the interactions between
> > layers, would, in Haskell, require code length which is quadratic in
> > the number of layers.
> 
> The trick here is that most components should not talk with each
> other. Composition and encapsulation are the keys to victory.

So you think that operating systems are poorly designed because the
layers talk to each other?

> > (Of course, it's true that in real operating systems, each of these
> > layers has its own set of interfaces to the other layers - so the
> > monadic approach is actually not more verbose. But the point is that
> > it's a reasonable design, with layers, and where each layer uses each
> > of the ones below it. I want to write code which is designed the same
> > way, but without the overhead)
> 
> Yes, the size of the code is dependent on the size of the API.
> Making things explicit is more infrastructure at the start,
> but makes things easier later on when they have to be changed.

I'm suggesting the API itself should be smaller (and hence more
manageable).

> > db2 <- getIOParam db2Param
> > withIOParam dbParam db2 $ ...
> 
> And one needs to make sure that the "..." part does not need the
> other database connection(s). Makes composing things hard.

I don't get it. The "..." part is just a query that you want to run in
the second connection, as in your example.

> > I'm still not sure I understand why thread pools are necessary, by the
> > way. I thought forking was pretty fast under GHC.
> 
> Threads are quite cheap. But with using a pool we can guarantee things
> about the number of threads and don't run to situations with 10000
> extra threads just because forking always is fun.

You can use a semaphore for this.

> The other point is to use a background thread which talks to
> blocking C API and executed callbacks upon receiving events from the
> C side.

Does that require the background thread to have a particular parent
thread?

Frederik

-- 
http://ofb.net/~frederik/