Questions concerning the LLVM backend - i.e. 'proc point splitting'

Fri Nov 27 16:03:22 UTC 2015

Hello Simon, hello Ben,

thank you very much to both of you for your encouraging answers!

First of all, I've two questions concerning your answer, Ben.

a)

> The jump issue aside, I don't know how you would deal with
> tables-next-to-code. The prefix data support that currently available
> in LLVM is attached to functions and I unfortunately don't see that
> changing any time soon.

Looks like I messed things up with those info-tables. As far as I
understood [1], these are part of the STG. Therefore, every function
present at the STG-level needs a info-table. The code generator does
generate info-tables for every proc-point after splitting them. But to
me, this was a consequence of making them stand-alone functions.

Could you explain the need of further info-tables for 'inner'
proc-points (those, which are not the entry-block of a function) to me,
please?

b)

> That being said, it sounds as though eliminating proc-point splitting
> would make for quite a nice project in the native code generator.

The documentation states, that proc-point splitting is only required by
the LLVM backend (see [2]). Does the cmm-pipeline perform proc-point
splitting even when using the native code generator?

Second, some background information concerning my situation. As a
student assistant, I'm part of a work group at university. They're doing
research on loop-optimization (which is the reason why I'm doing stencil
calculations in Haskell). The group uses LLVM (which is the reason why
I'm using the LLVM backend). Recently I talked to them, trying to
explain the reasons for GHC emitting code containing mutual recursion
instead of loops under certain circumstances.

They told me, that LLVM's cross-procedure optimization fails to merge
the mutual recursion into a single loop because of the following. When
pushing a continuation defined in the local LLVM module onto the stack
and calling into a method defined externally, the analysis can't assure,
that control eventually returns to the local continuation.

To make control flow more apparent to LLVM, they came up with the
following idea. Remark before showing the idea: I know that it greatly
interferes with CPS. It leaves parts of the continuation-handling to the
implicit LLVM call-stack. Further, this does not solve the problem of
proc-point splitting at all. It is more a workaround than a solution.
But it would greatly increase LLVM's possibilities to optimize code
later on. That's why I found this idea worth mentioning. Especially
because after reading your answer, Ben, I'm somewhat convinced that
solving the problem of proc-point splitting is nearly impossible with
the capabilities of LLVM IR available at the moment. Now back to the idea.

As stated above, the major problem of LLVM seems to be inferring control
flow when it comes to call proc-points. As it's hard to avoid proc-point
splitting, we leave the cmm-pipeline 'as-is'. We modify the code,
llvmGen produces when it comes to call proc-points instead. All other
types of proc-points remain unchanged.

Initial situation in pseudo cmm:

foo() {
...
call bar() returns to cont // bar defined externally

cont:
...
}

After proc-point splitting, this is converted into LLVM IR similar to
the following:

@foo() {
...
store @cont(), %Sp ; push continuation on the stack
tail call @bar() ; @bar defined externally
}

@cont() {
...
}

To fix the above issue, we introduce a method, which restores the
pointers Sp, Hp and the register R1 (among everything else, what has to
be restored) and 'fakes' the continuation. Note, that '@restore' returns
without calling into the continuation. This way, the call into the
continuation can be made direct. The above code would be transformed
into something similar to the following:

@foo() {
...
store @restore(), %Sp ; 'fake' continuation
call @bar() ;
br label restore

restore:
%Sp = load (getelementptr @environment 0, 0) ; restore what has to be
... ; restored
tail call @cont()
}

@cont() {
...
}

@restore(%Sp, %Hp, %R1, ...) {
store %Sp, (getelementptr @environment 0, 0) ; store everything into
... ; global variable
ret void
}

@environment ; structure of type similar to {i64*, i64*, i64*, i64, i64, ..}

As an alternative, the above transformation could be done by a LLVM
pass. llvmGen could then annotate the continuation of calls call cite.
This would greatly simplify the detection of the correct continuation.

What do you think about this idea? Especially when thinking about the
interference with CPS?

Regards,

David

[1]
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/GeneratedCode#Importantconceptsinthemachine

[2]
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/CodeGen#Secondstage:theCmmpipeline

On 11/21/2015 12:00 AM, Ben Gamari wrote:
> Simon Peyton Jones <simonpj at microsoft.com> writes:
> 
>> David
>>
>> All this stuff is terribly paged-out for me. But I'd love someone to
>> pay attention to some of this back-end stuff, so if you'd like to work
>> there, I'd like to help you.
>>
>> David Terei was also thinking of getting rid of proc point splitting
>> for LLVM (see attached email and the notes attached to it)
>>
>> Simon Marlow also was thinking about this; see his attached email.
>>
>> The main *reason* for establishing a proc-point is so that, when
>> generating C code (which we don't do any more) you can take its
>> address. E.g.
>>
>> foo() { ... Push &bar onto the stack (effectively a return address)
>>   Jump to thingumpy }
>>
>> bar() { ... }
>>
>> Really bar is part of foo, but we have make it a separate top level
>> thing so that we can take the address of bar and push it on the stack.
>>
>> The trouble is that if there is a label inside bar that foo wants to
>> jump to (i.e. without pushing &bar etc) then we have to make that
>> label too into a proc-point, so that both foo and bar can see it.
>> Urgh.
>>
>> In LLVM this probably is not true; presumably you can take the address
>> of any label?
>>
> This is true. While labels themselves have function-local scope in LLVM,
> there is an expression, `blockaddress`, which allows you to take an
> address to a label in another function [1]. That being said, in reading
> through the documentation it's not at all clear to me that it would be
> safe to jump to such an address. In fact, given that the instruction
> that this expression is supposed to be used with, `indirectbr`, can only
> be used for local blocks, I suspect it is not. More information about
> this feature can be found here [2].
> 
> The jump issue aside, I don't know how you would deal with
> tables-next-to-code. The prefix data support that currently available in
> LLVM is attached to functions and I unfortunately don't see that
> changing any time soon.
> 
> Ultimately it seems that trying to refer to labels defined in other
> functions is using LLVM against the way it was intended. One alternative
> would be to teach llvmGen to merge mutually recusive top-level functions
> into a single LLVM function during code generation. Otherwise I'm afraid
> you are stuck with either the status quo or attempting to improve on
> LLVM's own cross-procedure optimization ability.
> 
> That being said, it sounds as though eliminating proc-point splitting
> would make for quite a nice project in the native code generator.
> 
> Cheers,
> 
> - Ben
> 
> 
> [1] http://llvm.org/docs/LangRef.html#addresses-of-basic-blocks
> [2] http://blog.llvm.org/2010/01/address-of-label-and-indirect-branches.html
> [3] http://llvm.org/docs/LangRef.html#prefix-data
>