[Haskell-cafe] Current situation regarding global IORefs

Mon Apr 24 04:59:22 EDT 2006

On 24/04/06, Adrian Hey <ahey at iee.org> wrote:
> Lennart Augustsson wrote:
> > I think global mutable variables should be regarded with utmost
> > suspicion.  There are very few situations where they are the
> > right solution.
>
> Well IMO even the use of the term "global mutable variable" causes
> muddled thinking on this and I wish people would stop it. There's no
> reason to regard top level "things with identity" (call them "TWI"s
> or "objects" or whatever) with any more suspicion than top level IO
> actions themselves.

What do you mean by "top level IO action"? If you mean something like
'getLine', then there is a huge difference. If you mean actions which
execute automatically in any program which imports the given module,
then I'd contend that Haskell doesn't really even have those (apart
from possibly 'main', if you count that at all).

In the latter case, you might have disconnected components which all
must run, but since they do IO, are potentially noncommutative. With
the ability to mark actions as automatically executing without some
way to control the order of that execution, program behaviour is
unpredictable in general. At least, I wouldn't want the language to
allow that. However, putting an explicit ordering on their execution
is essentially equivalent to defining a new IO action which executes
each of them in a given sequence, obviating the need for such a
feature in the first place.

While only allowing the definition of top-level IORefs (i.e. not
unrestricted IO) wouldn't cause quite as much harm, it's still
questionable as to whether it's ever actually necessary. One can get
computation-global IORefs easily using something along the lines of
ReaderT IORef IO. By newtyping the monad, one could exert even more
control over how the IORef was used.

One problem with real top-level IORefs is that it leaves no way for
the module user to reset things in general. As a library designer, you
might think that the users of your library will only ever want to
initialise things once, but this is being a little bit overly forceful
-- what if the program has reloaded its configuration while running
and wants to restart the functionality your module provides? If you
haven't explicitly provided a way to reset the IORefs, there's no way
for the module user to reliably do so.

There are plenty of other nice ways to enforce things such as
particular actions only running once and so on. For one, you could use
a new monad, which would give extremely fine control over what actions
were permitted, but even with just IO, there are plenty of things one
can do -- just not globally.

We can relatively easily create a context which provides an IO action
that will only run once in that context, and return the same value as
the first time without repeating the execution thereafter:

singletonRegion :: IO a -> (IO a -> IO b) -> IO b
singletonRegion action region = do
    r <- newIORef Nothing
    let action' = do
            t <- readIORef r
            case t of
                Nothing -> do v <- action
                              writeIORef r (Just v)
                              return v
                Just v  -> return v
    region action'

test1 = singletonRegion (putStrLn "Hello") $ \greet -> do
    greet -- only this one will actually have any effect.
    greet
    greet

With a custom monad, this sort of setup would be built into the "run"
function which transforms the action back into IO, and so would be
even more transparent, while still not creating irrevocable
program-global state.
>
> One thing I never could fathom about the position of the nay-sayers
> in this debate is what exactly is it that they object to?
> Is it existence of top level "TWI"s and of IO operations that
> reference them *in principle*?
> Or are they content with their existence but just don't want to
> allow people to use Haskell to define them?
>
> If it's the former then we should be purging the IO libraries of
> all such horrors, though I can't see much remaining of them (well
> anything actually). But I guess an IO incapable language might
> still have some niche uses.
>
> If it's the latter then we are advocating a language which cannot
> be used to implement many demonstrably useful IO modules and libraries,
> *by design*. If so, the claim that Haskell is a general purpose
> programming language seems quite bogus and should be removed from
> the haskell.org home page IMO.

This argument sounds quite a lot like the ones used by those arguing
against the removal of GOTO from programming languages. Sometimes, by
explicitly not providing some feature, you're also making code in
general easier to comprehend (especially when that feature has
invisible global effects on programs, where you'd need to look
carefully at the source of all modules involved in order to determine
what was actually happening in the presence of its use). It can also
prevent certain classes of design problems from ever cropping up. The
more state which one has to reconstruct in order to test an arbitrary
bit of code in a program, the harder it is to debug and understand it,
which is one major reason that pure code is favoured over IO code in
the first place.

With top-level IORefs, you essentially relinquish control over who
gets access to a bit of statefulness which you've created. Everything
within the module might read or write it, and if it's exported, then
any module which imports yours might influence it as well.

Now, you might make the claim that IO computations can read and write
files on disk, and one could essentially (ab)use those, at least for
storable types, instead of top level IORefs if one were only crazy
enough to do so. So if this sort of state is already present in the
form of the local filesystem, why not have global IORefs too? An
example of the difference is that people don't have the tendency to
abuse file IO as an alternative to passing something as a parameter
(barring the sort of code which ends up on thedailywtf.com).

State is dangerous. It is the cause of a lot of misunderstandings
about program behaviour and as a result, the cause of a lot of bugs.
It should not be taken lightly.

Most Haskell programmers I know, myself included, have a tendency to
be extremely picky about what parts of the program have access to a
piece of state, and for how long that state can affect the
computation. You might see how proposing a feature which allows for
unlimited globally-accessible-from-IO state variables which last
forever goes against this culture very much. I would not want to have
to read, let alone debug, a program which made heavy use of more than
a very small number of top level IORefs. (Not to mention that, if
reasonable to do so, one of the first things I'd probably do is to
translate the code such that the IORefs are no longer top-level, just
for the sake of my own understanding.)

It's not that anyone thinks it would break the world to have top level
IORefs, it's just that it would make thinking about any program which
possibly used them a good deal harder, and with any new feature, one
has to ask "what happens if people actually start to use this?"

 - Cale