too much let-floating

Fri Jun 1 18:20:17 EDT 2007

Hia folks,

More performance bugs or misunderstandings. Yes, binary serialisation
again.

Consider this example from an instance for the Binary serialisation
class. We get lots and lots of code that looks like this:

data Foo a = Foo Int | Bar ...

instance Binary Foo where
  put (Foo a) = putTag 0 >> put a
  put (Bar ...) = putTag 1 ...

Let's expand this one step so we can see what we want to happen:

put (Foo a) = word8 (fromIntegral 0 :: Word8) >> word64le a

so we want to combine the bounds checks in word8 and word64le so that we
do just one check here rather than two.

Lets expand it another step:

put (Foo a) = write 1 (pokeWord8 (fromIntegral 0 :: Word8))
    `thenPut` write 8 (pokeWord64le (fromIntegral a :: Word64))

Now we have a rule:
"write/write" forall n m a b.
    write n a `thenPut` write m b
  = write (n+m) (\p -> a p >> b (p `plusPtr` n))

so obviously we'd like to apply this rule here to common-up the bounds
check that write n performs.

Normally this rule works fine, but in this specific example it does not.
Can you spot why?

Here's a simple case that works:

foo :: Word8 -> Word8 -> Put ()
foo a b = word8 a >> word8 b

and the one that doesn't

bar :: Word8 -> Put ()
bar a = word8 0 >> put a

What's the difference? Why should one work and the other not?

Of course I gave it away in the email subject: let floating.

In the second example, the 'bar' function, word8 0 does not depend on
the arguments of the function so it gets floated out:

lvl_s198 = write 1 (pokeWord8 0)

bar n =
  thenPut
    lvl_s198
    (write 8 (pokeWord64le n))

and now the rule does not match this because 'write' isn't there
directly in the expression.

This is a bit disappointing of course, so how do we fix it. There are
two possibilities as far as I can see. Either don't let float it, or
have the rule matcher look through the indirection.

The rule matcher is already looks through lets, but that's a slightly
different issue. That's for a situation like:

bar n =
  thenPut
    (let lvl_s198 = pokeWord8 0
      in write 1 lvl_s198)
    (write 8 (pokeWord64le n))

where we've got an actual let expression whose body would match the
rule's pattern.

It would not always be a good idea to "look up" let floated vars when
doing rule matching since it could lead to duplication. Though in this
example the let-floated expression is used only once, so that is not a
problem.

So one possibility might be to look up variables for the purpose of rule
matching if they are only used once.

Another possibility might be to not let-float it in the first place.
Usually let-floating things like this is a good idea, so how can we spot
that we might want to keep it in applicative form? The fact that it's
mentioned in the pattern of a rule is the strongest hint. As I
understand it, being mentioned in the head of a rule is already taken
into account when doing let-floating. However in this case the head of
the let-floated expression is not the head of the rule, but a sub
expression:

    write n a `thenPut` write m b
  = write (n+m) (\p -> a p >> b (p `plusPtr` n))

'thenPut' is the head of this rule's pattern. But we'd rather not float
'write' out either. So perhaps a similar heuristic should be applied to
all the free variables mentioned in the pattern.

Another approach might be for the library author to declare that some
function is cheap, or is otherwise important to not float out, but
should be kept in applicative form to aid rule matching.

We have similar problems with list comprehensions and enumFromTo:

[ ... | i <- [0..n], j <- [0..m] ]

because in the second generator [0..m] does not depend on the first
generator, it gets floated out and shared. Of course with fusion we can
make the [0..m] extremely cheap, but not if it has been floated out. If
it gets floated out and shared in memory between each iteration of the
first generator, then we really do have to traverse the list data
structure each iteration, rather than just incrementing an unboxed
number. So this is another example where we'd like to say that a
function, enumFromTo in this case, is extremely cheap and should not be
floated out, even if it looses sharing.

Actually, my example is simpler because it does not involve any loss of
sharing.

Duncan