Loop optimisation with identical counters
Simon Marlow
marlowsd at gmail.com
Mon Nov 8 08:11:50 EST 2010
On 05/11/2010 23:22, David Peixotto wrote:
> 1. The ability of LLVM to optimize Haskell functions is limited by the calling
> convention. Particularly for i386, function arguments are passed on a stack
> that LLVM knows nothing about. The reads and writes to the stack look like
> arbitrary loads and stores. It has no notion of popping elements from the
> stack which makes it difficult to know when it is ok to eliminate stores to
> the stack.
Our longish-term plan is to make the x86 backend pass arguments in
registers. We've been limited up to now by having to compile via C,
where if you want to use specific registers they have to be reserved
globally, but once that limitation is lifted we can start to use more
registers for argument passing in the x86 backend. A few other things
have to be in place for that to happen though, and I think we'll only
really be able to do this in the "new backend", which is still at the
prototype stage.
> 2. The possible aliasing introduced by casting integer arguments
> (R1-R6) to pointers limits the effectiveness of its optimizations.
>
> 3. A single Haskell source function is broken up into many small functions in
> the back end. Every evaluation of a case statement requires a new continuation
> point. These small functions kill the optimization context for LLVM. LLVM can
> recover some of the context by inlining calls to known functions, but the
> effectiveness of inlining is limited since it does not know that we are
> passing some parameters on the stack and not through the actual function call.
Again, this is something I think we'll be able to improve. Some of the
splitting into small functions is due to the limitations of the C backend.
> * Different calling conventions
>
> All the functions in the code generated for LLVM use the same calling
> convention fixed by GHC. It would be interesting to see if we could
> generate LLVM code where we pass all the arguments a function needs as
> actual arguments. We can then let LLVM do its optimizations and then have
> a later pass that spills extra arguments to the stack and makes our
> functions use the correct GHC calling convention.
For functions local to a module it would certainly be possible to use
different calling conventions, although there are restrictions on the
use of the C stack (basically the C stack pointer cannot move while
we're in Haskell code), so the C stack can't be used for argument
passing. I don't think you'd want to do that anyway.
Cheers,
Simon
> * Specialization of code after a runtime alias check
>
> We could specialize the code into two cases, one where some pointers may
> alias and one where they do not. We can then let LLVM fully optimized the
> code with no aliases. We would insert a check at runtime to see if there
> are aliases and then call the correct bit of code.
>
> * Optimization order matters
>
> Probably there are some wins to be had by choosing a good optimization
> sequence for the code generated from GHC, rather than just using `-O1`,
> `-O2`, etc. I believe It should be possible to find a good optimization
> sequence that would work well for Haskell codes.
>
> -David
>
> On Nov 4, 2010, at 5:29 AM, Christian Hoener zu Siederdissen wrote:
>
>> Here it is, feel free to change:
>> http://hackage.haskell.org/trac/ghc/ticket/4470
>>
>> I have added the core for the sub-optimal function 'f'. Criterion benchmarks are there, too. It
>> doesn't make much of a difference for this case -- I'd guess because everything fits into registers
>> here, anyway.
>>
>> Gruss,
>> Christian
>>
>> On 11/04/2010 09:42 AM, Simon Peyton-Jones wrote:
>>> Interesting. What would it look like in Core? Anyone care to make a ticket?
>>>
>>> S
>>>
>>> | -----Original Message-----
>>> | From: glasgow-haskell-users-bounces at haskell.org [mailto:glasgow-haskell-users-
>>> | bounces at haskell.org] On Behalf Of Roman Leshchinskiy
>>> | Sent: 03 November 2010 10:55
>>> | To: Christian Hoener zu Siederdissen
>>> | Cc: glasgow-haskell-users at haskell.org
>>> | Subject: Re: Loop optimisation with identical counters
>>> |
>>> | LLVM doesn't eliminate the counters. FWIW, fixing this would improve performance of
>>> | stream fusion code quite a bit. It's very easy to do in Core.
>>> |
>>> | Roman
>>> |
>>> | On 3 Nov 2010, at 10:45, Christian Hoener zu Siederdissen
>>> |<choener at tbi.univie.ac.at> wrote:
>>> |
>>> |> Thanks, I'll do some measurements on this with ghc7.
>>> |>
>>> |> Gruss,
>>> |> Christian
>>> |>
>>> |> On 11/02/2010 01:23 PM, Simon Marlow wrote:
>>> |>> On 02/11/2010 08:17, Christian Höner zu Siederdissen wrote:
>>> |>>> Hi,
>>> |>>>
>>> |>>> is the following problem a job for ghc or the code generation backend
>>> |>>> (llvm)?
>>> |>>>
>>> |>>> We are given this program:
>>> |>>>
>>> |>>> {-# LANGUAGE BangPatterns #-}
>>> |>>>
>>> |>>> module Main where
>>> |>>>
>>> |>>> f :: Int -> Int -> Int -> Int -> Int
>>> |>>> f !i !j !s !m
>>> |>>> | i == 0 = s+m
>>> |>>> | otherwise = f (i-1) (j-1) (s + i+1) (m + j*5)
>>> |>>>
>>> |>>> g :: Int -> Int
>>> |>>> g !k = f k k 0 0
>>> |>>>
>>> |>>>
>>> |>>> ff :: Int -> Int -> Int -> Int
>>> |>>> ff !i !s !m
>>> |>>> | i == 0 = s+m
>>> |>>> | otherwise = ff (i-1) (s + i+1) (m + i*5)
>>> |>>>
>>> |>>> gg :: Int -> Int
>>> |>>> gg !k = ff k 0 0
>>> |>>>
>>> |>>> main = do
>>> |>>> print $ g 20
>>> |>>> print $ gg 20
>>> |>>>
>>> |>>>
>>> |>>> Here, 'f' and 'g' are a representation of the code I have. Both counters
>>> |>>> 'i' and 'j' in 'f' count from the same value with the same step size and
>>> |>>> terminate at the same time but are not reduced to just one counter. Can
>>> |>>> I reasonably expect this to be done by the code generator?
>>> |>>> 'ff' represents what I would like to see.
>>> |>>
>>> |>> GHC doesn't have any optimisations that would do this currently,
>>> |>> although it's possible that LLVM's loop optimisations might do this on
>>> |>> the generated code for f.
>>> |>>
>>> |>> Cheers,
>>> |>> Simon
>>> |>>
>>> |>>
>>> |>>
>>> |>>> Btw. look at the core, to see that indeed 'f' keep four arguments.
>>> |>>> Functions like 'f' are a result of vector-fusion at work but can be
>>> |>>> written by oneself as well. The point is that if 'f' gets reduced to
>>> |>>> 'ff' then I can have this:
>>> |>>>
>>> |>>> fun k = zipWith (+) (map f1 $ mkIdxs k) (map f2 $ mkIdxs k)
>>> |>>>
>>> |>>> which makes for nicer code sometimes; but before rewriting I wanted to
>>> |>>> ask if that kills performance.
>>> |>>>
>>> |>>>
>>> |>>> Thanks,
>>> |>>> Christian
>>> |>>>
>>> |>>>
>>> |>>>
>>> |>>> _______________________________________________
>>> |>>> Glasgow-haskell-users mailing list
>>> |>>> Glasgow-haskell-users at haskell.org
>>> |>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>> |>>
>>> |>
>>> |> _______________________________________________
>>> |> Glasgow-haskell-users mailing list
>>> |> Glasgow-haskell-users at haskell.org
>>> |> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>> | _______________________________________________
>>> | Glasgow-haskell-users mailing list
>>> | Glasgow-haskell-users at haskell.org
>>> | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>>
>>
>> _______________________________________________
>> Glasgow-haskell-users mailing list
>> Glasgow-haskell-users at haskell.org
>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
More information about the Glasgow-haskell-users
mailing list