Loop optimisation with identical counters

Fri Nov 5 19:44:21 EDT 2010

On 05/11/2010, at 23:22, David Peixotto wrote:

> I spent some time looking at the code generated for llvm and the optimizations
> it can apply. There were quite a bit of details to examine and I wrote it up
> as blog post here:
> http://www.dmpots.com/blog/2010/11/05/optimizing-haskell-loops-with-llvm.html.

Nice! Thanks a lot for doing that!

> To summarize, I found that it is possible to get LLVM to do this
> transformation through a combination of tail-call elimination, inlining,
> induction variable optimization, and global value numbering. This works fine
> on x86_64 where we can pass parameters in registers, but fails to fully fire
> in i386 back end because LLVM gets caught up by aliasing problems because
> function parameters are passed on the stack. The possible aliasing between the
> stack pointer (Sp) and the function argument pointer (R1) prevented the full
> transformation, but it was still able to reduce the f loop to straight line code.

Hmm... IIRC we agreed that Sp is never aliased in GHC-generated code and David Terei (I'm cc'ing here, not sure if he reads the list) made sure to include appropriate annotations in Haskell code. In fact, in your post Sp is passed as i32* noalias nocapture %Sp_Arg. Isn't that enough for LLVM to know that Sp isn't aliased?

> 1. The ability of LLVM to optimize Haskell functions is limited by the calling
> convention. Particularly for i386, function arguments are passed on a stack
> that LLVM knows nothing about. The reads and writes to the stack look like
> arbitrary loads and stores. It has no notion of popping elements from the
> stack which makes it difficult to know when it is ok to eliminate stores to
> the stack. 

But shouldn't it just promote stack locations to registers?

> 2. The possible aliasing introduced by casting integer arguments
> (R1-R6) to pointers limits the effectiveness of its optimizations.

Yes, that's a big problem. David tried to solve some of it by including noalias annotations but it's not clear what to do with, say, newly allocated ByteArrays which can't be aliased by anything.

Anyway, it's great to know that there are things we can improve to make LLVM optimise better.

Roman