LLVM calling convention for AVX2 and AVX512 registers

Tue Mar 14 18:29:56 UTC 2017

Hrmm. In C/C++ I can tell individual functions to turn on additional ISA
feature sets with compiler-specific __attribute__((target("avx2"))) tricks.
This avoids complains from the compiler when I call builtins that aren't
available at my current compilation feature level. Perhaps pragmas for the
codegen along those lines is what we'd ultimately need? Alternately, if we
simply distinguish between what the ghc codegen produces with one set of
options and what we're allowed to ask for explicitly with another then
user-land tricks like I employ would remain sound.

-Edward

On Mon, Mar 13, 2017 at 7:26 PM, Ben Gamari <ben at well-typed.com> wrote:

> Edward Kmett <ekmett at gmail.com> writes:
>
> > That, rather tangentially, reminds me: If we do start to teach the code
> > generator about how to produce these sorts of things from simpler parts,
> > e.g. via enabling something like LLVM's vectorization pass, or some
> > internal future ghc compiler pass that checks for, say, Superword-Level
> > Parallelism
> > <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.
> 106.4663&rep=rep1&type=pdf>
> > in the style of Jaewook Shin, then we need to differentiate between flags
> > for what ghc/llvm is allowed to produce via optimization, etc. and what
> the
> > end user is allowed to explicitly emit. e.g. in my own code I can safely
> > call avx2 primitives after I set up guards to check that I'm on a CPU
> that
> > supports them, but I can only currently emit that code after I tell GHC
> > that I want it to allow the avx2 instructions. If I build a complicated
> > dispatch mechanism in Haskell for picking the right ISA and emitting code
> > for several of them, I'm going to need to tell ghc to let me build with
> all
> > sorts of instruction sets that the machine the final executable runs on
> may
> > not fully support. We should be careful not to conflate these two things.
> >
> Indeed this is tricky.
>
> The obvious stop-gap solution is to simply move your various platform
> dependent implementations into multiple modules. However, as you say
> this quickly breaks down once GHC itself starts to learn vectorisation.
> At that point you will need to draw the distinction you mention,
> separating the ISA available to the user and that available to the
> compiler.
>
> Another related question is whether you eventually want a way to specify
> an ISA per-function (via pragma, for instance). This would allow you to
> set a conservative `-march` for the module on the whole, but allow use
> of ISA extensions precisely when necessary. This is a bit tricky in the
> face of inlining; perhaps you want to require only `NOINLINE` functions
> can be decorated with such a thing.
>
> I suspect in the case of LLVM this will require breaking modules up into
> multiple compilation units and linking together the resulting objects.
> This will certainly require a fair bit of engineering effort but nothing
> terribly difficult.
>
> Regarding dispatch, GCC has a function multi-versioning mechanism [1]
> which is seems relevant to mention here. However, it's not entirely
> clear to me whether the complexity here is worthwhile for GHC.
>
> Anyways, there are plenty of possible options here; it would be helpful
> to have a feature request ticket for the "user/compiler ISA" idea you
> propose where we can collect ideas. Perhaps you could open one?
>
> Cheers,
>
> - Ben
>
>
> [1] https://lwn.net/Articles/691666/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20170314/e00a3c94/attachment.html>