[Haskell] thread-local variables

Mon Aug 7 08:16:35 EDT 2006

On Sun, Aug 06, 2006 at 01:36:15PM +0300, Einar Karttunen wrote:
> On 06.08 02:41, Frederik Eaton wrote:
> > Also, note that my proposal differs in that thread local variables are
> > not writable, but can only be changed by calling (e.g. in my API)
> > 'withIOParam'. This is still just as general, because an IORef can be
> > stored in a thread-local variable, but it makes it easier to reason
> > about the more common use case where TLS is used to make IO a Reader;
> > and it makes it easier to share modifiable state across more than one
> > thread. I.e. if modifiable state is stored as 'IOParam (IORef a)' then
> > the default is for the stored 'IORef a' to be shared across all
> > threads; it can only be changed "locally" for a specified action and
> > any sub-threads using 'withIOParam'; and if some library I use decides
> > to fork a thread behind the scenes, it won't change my program's
> > behavior.
> 
> Perhaps a function like this would solve all our problems:
> 
> -- | Tie all TLS references in the IO action to the current
> -- environment rather than the environment it will actually
> -- be executed.
> tieToCurrentTLS :: IO a -> IO (IO a)

"Our" problems? :) Well, it should be easy to implement. I think it's
a good idea.

> > I think it is a good idea to have stdin, cwd, etc. be thread-local.
> 
> How would this work together with the FFI?

It wouldn't, at least I wouldn't care if it didn't.

> > I don't understand why the 'TL' monad is necessary, but I haven't read
> > the proposal very carefully.
> 
> The TL monad is necessary to make initialization order problems go
> away.

That's what it seemed like the intended purpose was, but I don't see
any initialization order problems in my proposal.

> On 05.08 19:56, Frederik Eaton wrote:
> > That doesn't answer the question: What if my application has a need
> > for several different sets of parameters - what if it doesn't make
> > sense to combine them into a single monad? What if there are 'n'
> > layers? Is it incorrect to say that the monadic approach requires code
> > size O(n^2)?
> 
> Well designed monadic approach does not require O(n^2). But if you
> want to design code in a way that requires O(n^2) code size you
> can do it.
> 
> Parallel layers require O(layers).
> Nested layers hiding the lower layer need O(layers).
> 
> This is not a problem in practice and makes refactoring very easy.

Is that true? I would be very careful when making generalizations
about all software design.

What about my example:

newMain host environment program_args
    network_config locale terminal_settings
    stdin stdout stderr = do
    ...

Now, let's see. We might want two threads to have the same network
configuration, but a different view of the filesystem; or the same
view of the filesystem, but a different set of environment variables;
or the same environment, but different command line arguments. All
three cases are pretty common in practice. We might also want to have
the same arguments but different IO handles - as in a multi-threaded
server application.

And the part that implements the filesystem might want to access the
network (if there is a network filesystem). And the part that starts
processes with an environment might want to access the filesystem, for
instance to read the code for the process and for shared libraries;
and maybe it also wants to get the hostname from the network layer. 
And the part that starts programs with arguments might want to access
the environment (for instance, to get the current locale), as well as
the filesystem (for instance, to read locale configuration files). And
the part that accesses the IO handles might also want to access not
just the program arguments but the environment, and the filesystem,
and the network.

So here is an example where we have nested layers, and each layer
accesses most of the layers below it.

kernel (networking, devices)
filesystem
linker
libc
application

If we started with a library that dealt with OS devices such as the
network, and used a special monad for that; and then if we built upon
that a layer for keeping track of environment variables, with another
monad; and then a layer for invoking executables with arguments; and
then a layer for IO; all with monads - then we would have a good
modular, extensible design, which, due to the interactions between
layers, would, in Haskell, require code length which is quadratic in
the number of layers.

(Of course, it's true that in real operating systems, each of these
layers has its own set of interfaces to the other layers - so the
monadic approach is actually not more verbose. But the point is that
it's a reasonable design, with layers, and where each layer uses each
of the ones below it. I want to write code which is designed the same
way, but without the overhead)

> > > And don't have any static guarantees that you have done all the proper
> > > initialization calls before you use them.
> > 
> > Well, there are a lot of things I don't have static guarantees for. 
> > For instance, sometimes I call the function 'head', and the compiler
> > isn't able to verify that the argument isn't an empty list. If I
> > initialize my TLS to 'undefined' then I'll get a similar error
> > message, at run time. For another example, I don't use monadic regions
> > when I do file IO. I can live with that.
> 
> The problem is with refactoring and taking a piece of code and
> reusing it somewhere else - and trying to figure out what does
> it need.

If you move it somewhere else, but forget to move the thread-local
variables it refers to, then you'll get a compiler error.

> > > ... Also if we have two pieces of the same per-thread state that we
> > > wish to use in one thread (e.g. db-connections) then the TLS
> > > approach becomes quite hard.
> > 
> > No harder than the monadic approach, in my opinion.
> 
> In the monadic approach adding a second db connection would involve:
> 1) add a line to the state record
> 2) add a db2query = withPart db2 . flip query
> 3) no changes elsewhere
> 
> If the DB API uses a TLS parameter of type "Proxy DBH" how would
> you implement this in a nice manner for the TLS case?

db2 <- getIOParam db2Param
withIOParam dbParam db2 $ ...

> > You've redefined 'fork'. If I want a library which works with other
> > libraries, that will not be an option. The original purpose of my
> > posting to this thread was to ask for two standard functions which
> > would let me define thread-local variables in a way which is
> > interoperable with other libraries, to the same extent as 'withArgs'
> > and 'withProgName' are.
> 
> All libraries which may fork may use a preallocated thread pool.
> Thus they might not work with TLS. 

I don't know of many libraries which use 'fork', so I can't say if
most of them use a preallocated thread pool. My guess would be that
the most common use would be local, for instance running a command and
monitoring it in another thread. But the uses we care about are where
a 'fork' appears in a library which has functions with 'IO t'
arguments, and it well may be that this happens more often with a
thread pool.

I'm still not sure I understand why thread pools are necessary, by the
way. I thought forking was pretty fast under GHC.

> withArgs and withProgName are global and not very thread-friendly.

I wasn't aware of that. That's too bad.

> On 06.08 04:23, Frederik Eaton wrote:
> > I also forgot to mention that if you hold on to a ThreadId, it
> > apparently causes the whole thread to be retained. Simon Marlow
> > explained this on 2005/10/18:
> 
> Actually this problem does not exist in the code.
> The problem is encountered if children are tied to their parents,
> that is they contain the ThreadId of the parent thread. In my
> code this problem should not occur.

I see, I apologize for missing that.

Frederik

-- 
http://ofb.net/~frederik/