Loop optimisation with identical counters

Mon Nov 8 08:11:50 EST 2010

On 05/11/2010 23:22, David Peixotto wrote:

> 1. The ability of LLVM to optimize Haskell functions is limited by the calling
> convention. Particularly for i386, function arguments are passed on a stack
> that LLVM knows nothing about. The reads and writes to the stack look like
> arbitrary loads and stores. It has no notion of popping elements from the
> stack which makes it difficult to know when it is ok to eliminate stores to
> the stack.

Our longish-term plan is to make the x86 backend pass arguments in 
registers.  We've been limited up to now by having to compile via C, 
where if you want to use specific registers they have to be reserved 
globally, but once that limitation is lifted we can start to use more 
registers for argument passing in the x86 backend.  A few other things 
have to be in place for that to happen though, and I think we'll only 
really be able to do this in the "new backend", which is still at the 
prototype stage.

> 2. The possible aliasing introduced by casting integer arguments
> (R1-R6) to pointers limits the effectiveness of its optimizations.
>
> 3. A single Haskell source function is broken up into many small functions in
> the back end. Every evaluation of a case statement requires a new continuation
> point. These small functions kill the optimization context for LLVM. LLVM can
> recover some of the context by inlining calls to known functions, but the
> effectiveness of inlining is limited since it does not know that we are
> passing some parameters on the stack and not through the actual function call.

Again, this is something I think we'll be able to improve.  Some of the 
splitting into small functions is due to the limitations of the C backend.

> *  Different calling conventions
>
>      All the functions in the code generated for LLVM use the same calling
>      convention fixed by GHC. It would be interesting to see if we could
>      generate LLVM code where we pass all the arguments a function needs as
>      actual arguments. We can then let LLVM do its optimizations and then have
>      a later pass that spills extra arguments to the stack and makes our
>      functions use the correct GHC calling convention.

For functions local to a module it would certainly be possible to use 
different calling conventions, although there are restrictions on the 
use of the C stack (basically the C stack pointer cannot move while 
we're in Haskell code), so the C stack can't be used for argument 
passing.  I don't think you'd want to do that anyway.

Cheers,
	Simon

> *  Specialization of code after a runtime alias check
>
>      We could specialize the code into two cases, one where some pointers may
>      alias and one where they do not. We can then let LLVM fully optimized the
>      code with no aliases. We would insert a check at runtime to see if there
>      are aliases and then call the correct bit of code.
>
> *  Optimization order matters
>
>      Probably there are some wins to be had by choosing a good optimization
>      sequence for the code generated from GHC, rather than just using `-O1`,
>      `-O2`, etc. I believe It should be possible to find a good optimization
>      sequence that would work well for Haskell codes.
>
> -David
>
> On Nov 4, 2010, at 5:29 AM, Christian Hoener zu Siederdissen wrote:
>
>> Here it is, feel free to change:
>> http://hackage.haskell.org/trac/ghc/ticket/4470
>>
>> I have added the core for the sub-optimal function 'f'. Criterion benchmarks are there, too. It
>> doesn't make much of a difference for this case -- I'd guess because everything fits into registers
>> here, anyway.
>>
>> Gruss,
>> Christian
>>
>> On 11/04/2010 09:42 AM, Simon Peyton-Jones wrote:
>>> Interesting.  What would it look like in Core?  Anyone care to make a ticket?
>>>
>>> S
>>>
>>> |  -----Original Message-----
>>> |  From: glasgow-haskell-users-bounces at haskell.org [mailto:glasgow-haskell-users-
>>> |  bounces at haskell.org] On Behalf Of Roman Leshchinskiy
>>> |  Sent: 03 November 2010 10:55
>>> |  To: Christian Hoener zu Siederdissen
>>> |  Cc: glasgow-haskell-users at haskell.org
>>> |  Subject: Re: Loop optimisation with identical counters
>>> |
>>> |  LLVM doesn't eliminate the counters. FWIW, fixing this would improve performance of
>>> |  stream fusion code quite a bit. It's very easy to do in Core.
>>> |
>>> |  Roman
>>> |
>>> |  On 3 Nov 2010, at 10:45, Christian Hoener zu Siederdissen
>>> |<choener at tbi.univie.ac.at>  wrote:
>>> |
>>> |>  Thanks, I'll do some measurements on this with ghc7.
>>> |>
>>> |>  Gruss,
>>> |>  Christian
>>> |>
>>> |>  On 11/02/2010 01:23 PM, Simon Marlow wrote:
>>> |>>  On 02/11/2010 08:17, Christian Höner zu Siederdissen wrote:
>>> |>>>  Hi,
>>> |>>>
>>> |>>>  is the following problem a job for ghc or the code generation backend
>>> |>>>  (llvm)?
>>> |>>>
>>> |>>>  We are given this program:
>>> |>>>
>>> |>>>  {-# LANGUAGE BangPatterns #-}
>>> |>>>
>>> |>>>  module Main where
>>> |>>>
>>> |>>>  f :: Int ->   Int ->   Int ->   Int ->   Int
>>> |>>>  f !i !j !s !m
>>> |>>>    | i == 0    = s+m
>>> |>>>    | otherwise = f (i-1) (j-1) (s + i+1) (m + j*5)
>>> |>>>
>>> |>>>  g :: Int ->   Int
>>> |>>>  g !k = f k k 0 0
>>> |>>>
>>> |>>>
>>> |>>>  ff :: Int ->   Int ->   Int ->   Int
>>> |>>>  ff !i !s !m
>>> |>>>    | i == 0    = s+m
>>> |>>>    | otherwise = ff (i-1) (s + i+1) (m + i*5)
>>> |>>>
>>> |>>>  gg :: Int ->   Int
>>> |>>>  gg !k = ff k 0 0
>>> |>>>
>>> |>>>  main = do
>>> |>>>    print $ g 20
>>> |>>>    print $ gg 20
>>> |>>>
>>> |>>>
>>> |>>>  Here, 'f' and 'g' are a representation of the code I have. Both counters
>>> |>>>  'i' and 'j' in 'f' count from the same value with the same step size and
>>> |>>>  terminate at the same time but are not reduced to just one counter. Can
>>> |>>>  I reasonably expect this to be done by the code generator?
>>> |>>>  'ff' represents what I would like to see.
>>> |>>
>>> |>>  GHC doesn't have any optimisations that would do this currently,
>>> |>>  although it's possible that LLVM's loop optimisations might do this on
>>> |>>  the generated code for f.
>>> |>>
>>> |>>  Cheers,
>>> |>>     Simon
>>> |>>
>>> |>>
>>> |>>
>>> |>>>  Btw. look at the core, to see that indeed 'f' keep four arguments.
>>> |>>>  Functions like 'f' are a result of vector-fusion at work but can be
>>> |>>>  written by oneself as well. The point is that if 'f' gets reduced to
>>> |>>>  'ff' then I can have this:
>>> |>>>
>>> |>>>  fun k = zipWith (+) (map f1 $ mkIdxs k) (map f2 $ mkIdxs k)
>>> |>>>
>>> |>>>  which makes for nicer code sometimes; but before rewriting I wanted to
>>> |>>>  ask if that kills performance.
>>> |>>>
>>> |>>>
>>> |>>>  Thanks,
>>> |>>>  Christian
>>> |>>>
>>> |>>>
>>> |>>>
>>> |>>>  _______________________________________________
>>> |>>>  Glasgow-haskell-users mailing list
>>> |>>>  Glasgow-haskell-users at haskell.org
>>> |>>>  http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>> |>>
>>> |>
>>> |>  _______________________________________________
>>> |>  Glasgow-haskell-users mailing list
>>> |>  Glasgow-haskell-users at haskell.org
>>> |>  http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>> |  _______________________________________________
>>> |  Glasgow-haskell-users mailing list
>>> |  Glasgow-haskell-users at haskell.org
>>> |  http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>>
>>
>> _______________________________________________
>> Glasgow-haskell-users mailing list
>> Glasgow-haskell-users at haskell.org
>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users