LLVM calling convention for AVX2 and AVX512 registers

Tue Mar 14 20:32:36 UTC 2017

On 03/14/2017 04:02 PM, Ben Gamari wrote:
> Edward Kmett <ekmett at gmail.com> writes:
>
>> Hrmm. In C/C++ I can tell individual functions to turn on additional ISA
>> feature sets with compiler-specific __attribute__((target("avx2"))) tricks.
>> This avoids complains from the compiler when I call builtins that aren't
>> available at my current compilation feature level. Perhaps pragmas for the
>> codegen along those lines is what we'd ultimately need? Alternately, if we
>> simply distinguish between what the ghc codegen produces with one set of
>> options and what we're allowed to ask for explicitly with another then
>> user-land tricks like I employ would remain sound.
>>
> I'm actually not sure that simply distinguishing between the user- and
> codegen-allowed ISA extensions is quite sufficient. Afterall, AFAIK LLVM
> doesn't make such a distinction itself: AFAIK if you write a vector
> primitive and compile for a target that doesn't have an appropriate
> instruction the code-generator will lower it with software emulation.

This would mean that Haskell libraries compiled with different flags
would not be ABI compatible.

Our original paper exposed a Multi type class that was meant to be the
programmer interface to the primops. A Multi a would be the widest
vector type supported on the current architecture, so code that used a
Multi Double would always be guaranteed to work at the widest vector
type available for Double's.

The Multi approach explicitly eschewed lowering, but I would argue that
if performance is the goal, then automatic lowering is not what you
want. I would rather have the system pick the correct vector width for
me based on the current architecture.

This does nothing to solved the problem of ABI compatibility, which is
one reason I didn't push to get this upstreamed.

Is the Multi approach desirable? I think it would be nice to be able to
at least provide such a solution even if it isn't some sort of default.
Do we really want lowering of wider vector types?

Geoff

> However, adding a pragma to allow per-function target annotations seems
> quite reasonable and easily doable. Moreover, contrary to my previous
> assertion, it shouldn't require any splitting of compilation units. I
> ran a quick experiment, compiling this program,
>
>     __attribute__((target("sse2"))) int hello() {
>       return 1; 
>     }
>
> With clang. It produced something like,
>
>     define i32 @hello() #0 {
>       ret i32 1
>     }
>
>     attributes #0 = { "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" ... }
>
> So it seems LLVM is perfectly capable of expressing this; in hindsight
> I'm not sure why I ever doubted this.
>
> There are a number of details that would need to be worked out regarding
> how such a pragma should behave. Does the general direction sound
> reasonable? I've opened #13427 [1] to track this idea.
>
> Cheers,
>
> - Ben
>
>
> [1] https://ghc.haskell.org/trac/ghc/ticket/13427