Optimizations for mutable structures?

Fri Dec 9 07:13:45 EST 2005

"Simon Marlow" <simonmar at microsoft.com> writes:

> >                            In the general case, for some arbitrary
> > actions between the write and the read (excluding another write of
> > course), there is no guarantee that the IORef remains unmodified.
> 
> This is an analysis that's performed all the time in C compilers, it's
> quite straightforward to do a good approximation.  One simple algorithm
> is: a store can be forwarded to a matching read as long as there are no
> intervening writes that may alias, or function calls.
> 
> C does this and C has threads, so what's the difference?

There is a big difference between C variables and IORefs.  For one
thing, a C variable can have scope local to a procedure.  If so,
then the suggested transformation is entirely valid even in the
presence of threads, because no other thread is permitted access to
the local variable.

    M[fp+offset(x)] <- v
    ...
    w <- M[fp+offset(x)]
===>
    M[fp+offset(x)] <- v
    ...
    w <- v

Global variables are more tricky, but I guess it is probably common
to permit the elimination of the 'read' there as well, even though
technically this is unsafe in the presence of threads.  My guess is
that this behaviour is the cause of a lot of thread unsafeness in
common C libraries however.

    M[address(x)] <- v
    ...
    w <- M[address(x)]
===>
    M[address(x)] <- v
    ...
    w <- v

Haskell has neither local nor global variables in the same sense as C.

IORefs more closely resemble C pointers.  A pointer has indeterminate
lifetime and scope.  It can be passed around from procedure to
procedure, and from thread to thread.  There can be aliases to the
same memory location in multiple other contexts.  If I'm writing
a hardware driver, the pointer address might be memory-mapped to a
device register, in which case writing to that address and immediately
reading back from it may not be an identity.  I've certainly dealt
with devices where the same hardware port when written to, expects
control data, but when read from delivers status data.  So here:

    M[M[x]] <- v
    ...
    w <- M[M[x]]

it would be totally incorrect for the compiler to eliminate the read.

Regards,
    Malcolm