Common subexpressions are not optimized to single one?

Tue Dec 2 20:55:42 EST 2003

G'day all.

Quoting Koji Nakahara <yu- at div.club.ne.jp>:

> I'm wondering about the optimization of ghc. I thought that since
> functions are pure in haskell, compiler can/do factorize common
> subexpressions when it optimizes a code.

The short answer is "no".  While the optimisation preserves semantics,
it will not, in general, preserve pragmatics.

A classic (and contrived) example is this:

        slow_pow2 :: Int -> Int
        slow_pow2 n
          = length (subsets [1..n])
          where
            subsets [] = [[]]
            subsets (x:xs) = subsets xs ++ [ x:xs' | xs' <- subsets xs ]

On my machine, slow_pow2 32 runs (very slowly; I didn't bother to let
it finish).  Optimising the common subexpression on the last line (i.e.
the recursive call) exhausts the heap.

In general, automatically optimising common subexpressions is only a
good idea if the compiler can prove that the expression optimised has
a bounded size, and even then, it's not always a good idea.  Some
libraries which use unsafePerformIO creatively actually rely on this
behaviour.  I've written code, for example, which uses does something
like this:

        {-# NOINLINE fooRef1 #-}
        fooRef1 :: IORef Foo
        fooRef1 = unsafePerformIO (newIORef makeAFoo)

        {-# NOINLINE fooRef1 #-}
        fooRef2 :: IORef Foo
        fooRef2 = unsafePerformIO (newIORef makeAFoo)

        fooOp1 :: IO ()
        fooOp1 = modifyIORef fooRef1 something >> return ()

        fooOp2 :: IO ()
        fooOp2 = modifyIORef fooRef2 somethingElse >> return ()

This is perfectly safe (as long as the compiler respects NOINLINE!), but
would break horribly if fooRef1 and fooRef2 were "optimised".

Cheers,
Andrew Bromage