FFI and LLVM

Thu Mar 31 00:48:22 CEST 2011

A relevant trac ticket and wiki page can be found here:

http://hackage.haskell.org/trac/ghc/ticket/3557
http://hackage.haskell.org/trac/ghc/wiki/VectorComputing

Vivian seems to now have partial patches for #3557. I haven't reviewed
the discussion in the ticket or the patches in particular but it's
good to see work here!

In terms of using the FFI in practically any language, your typical
"glue" to close to the metal languages is fairly simple: the C calling
convention. In terms of utilizing LLVM bitcode, you could merely
generate LLVM bitcode (which uses the vector type you linked to) by
any number of means - the haskell bindings come to mind :) - and then
use the LLVM toolchain to generate optimized assembly code for
whatever fast primitives you need. You now have an object-file you can
link into your resulting program which just has regular haskell code +
FFI code (NB. cabal doesn't allow you to link in arbitrary object
files I don't think.) LLVM by default uses the regular C calling
convention for its functions so you shouldn't have to jump through any
hoops.

But, in the general case I don't think the regular FFI/library route
is the way to go perhaps (or maybe I don't know exactly what you
want.) The main reason I can think of is object-file boundaries.
Presumably what we really want in the most general case is the ability
to write haskell code and know that GHC will guarantee it uses SIMD
machine instructions for it. We want to leave the register selection
and instruction selection up to GHC in the final stages of code
generation.

We can certainly wrap lots of small SIMD primitives/micro-ops in shims
of LLVM, but I wonder what the actual expense of doing that is. As far
as I understand, GHC will at least have to spill/reload registers on
the foreign call (since you're going to be FFI'ing out to a function,
it'll need the C calling convention.) The C compiler/LLVM would
compile the external object file you link in, and it contains your
said primitives. GHC-generated code is going to exist in other object
files, and I honestly don't think the linker will be able to see past
the boundary and move the assembly code into the call site, but
perhaps I am wrong.

When using lots of micro-ops wrapped around the FFI, this (the
register spilling + inability to inline) will probably add up, but
whether or not it's enough to matter is up to debate and it requires
evidence and benchmarking, so you need a basis to compare it on. I
think it would be worth trying this sort of investigation to look at
the overhead, because frankly, I could be wrong. Maybe FFI calls are
incredibly cheap.

I think the FFI + SIMD from the 'outside world' (C/LLVM) is a feasible
operation if what you want to implement is a specific, fairly
'straight forward' operation that GHC can't generate great code for,
where the net gains in speed are good (and thus outweigh the foreign
calls.) In terms of general primitives we can build fast code on, with
the guarantee it is backed by SIMD instructions in the code generator,
I think building the functionality more directly into GHC as Vivian
has done is the way to go.

Another, more LLVM-specific way of overcoming these limitations, now
that I think about it, is LLVM-based link time optimization.

GHC already accepts raw LLVM bitcode files looking at the code for the
GHC driver and will bring them into the final compilation/link scope.
GHC's LLVM backend already directly outputs the LLVM bitcode format -
so you could, feasibly, just create a hand-written LLVM bitcode file
that sort of shims the LLVM vector type. This file can be directly
'linked into' another bitcode file by the LLVM toolchain - the file in
question being the LLVM bitcode for your program, generated by GHC.
This overcomes the object-boundary limitation, and will let the LLVM
code generator/optimizer do more optimization. If the shims are small,
I imagine they could be inlined and LLVM will take care of all the
register selection - perhaps even eliminating any spills/loads.

So that process would be similar to what you want: write FFI code
calling out to an external function using the normal C calling
convention. Write special LLVM bitcode file exposing the primitives
you want to use from haskell. When GHC compiles your program, it
generates LLVM bitcode. Link your primitive bitcode together with the
generated code, and run the LLVM toolchain on it.

This approach is much more general of course, far beyond SIMD
primitives. We could apply LLVM-based LTO to the entire stack - and
optimize the whole program basically at the end of the compilation
stage (and it could be practically free, since LLVM will do this for
us.) It would definitely alleviate some of these issues, and perhaps
LLVM can generate really good code for the whole program.

I'm not a hardcore GHC hacker by any stretch, I think that covers your
question pretty well. I'd appreciate any corrections if I'm totally
wrong somewhere, because this work interests me as well and I've
thought about it a bit at this point.

On Wed, Mar 30, 2011 at 4:43 PM, Louis Wasserman
<wasserman.louis at gmail.com> wrote:
> I'm curious: as I understand it, one of the purposes of the FFI is to allow
> bindings to "close to the metal" languages.  I'm curious what would be
> involved in letting the FFI provide bindings to LLVM code in Haskell.
> While I'm at it, I'll mention the most recent reason that this thought
> occurred to me, which is the complaint in here that Haskell can't currently
> get access to SIMDs, while LLVM does have that access, and I'm trying to
> think about how I might go about gluing those together.
>
> Louis Wasserman
> wasserman.louis at gmail.com
> http://profiles.google.com/wasserman.louis
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
>

-- 
Regards,
Austin