Vector primops sizes

Geoffrey Mainland mainland at
Thu Feb 14 00:29:20 CET 2013

I haven't seen Michael's patches (where are they btw?), but there is
some extra work to be done to ensure that 256-bit values are passed in
registers. Otherwise adding support for wider vector types is fairly

The current plan is for 256-bit wide vector primops to always be
available. The programmer can test for the __AVX__ CPP symbol, which
indicates that these primops will be compiled to efficient code. I am
not inclined to add wider vector primops, as there is no current
platform where they can be compiled efficiently.

Most programmers should use the Multi type family instead of working
with primops (or their boxed wrappers) directly. For example, by using
Multi Double instead of DoubleX2, the programmer will get 256-bit wide
vectors on platforms that support AVX, and 128-bit wide vectors
otherwise. See for details.


On 02/13/2013 07:44 AM, Simon Peyton-Jones wrote:
> I believe Geoff is working on adding AVX.  I expect he’d be interested
> in your patches.
> Simon
> *From:*ghc-devs-bounces at
> [mailto:ghc-devs-bounces at] *On Behalf Of *Carter Schonwald
> *Sent:* 13 February 2013 05:59
> *To:* Michael Baikov
> *Cc:* ghc-devs at
> *Subject:* Re: Vector primops sizes
> Yes please! having these  (for valid target arches/ CPU targets) would
> be really really valuable for me.
> On Feb 13, 2013 12:07 AM, "Michael Baikov" <manpacket at
> <mailto:manpacket at>> wrote:
>> Recently merged vector primops support only 16 bytes operands - Int32
>> x 4, Double x 2 and so on. Current AVX instructions support 256 bit
>> operands and with simple cut'n'paste work it's possible to support at
>> least Double x 4 operands. I made those changes and GHC generates
>> (using llvm) proper AVX code using ymm registers. Also it might make
>> sense to support primops for vector types larger than any currently
>> supported primitive types - I have those changes in my branch as well
>> and llvm generates pretty good code as well - those changes might be
>> useful to provide access for llvm shufflevector instruction or writing
>> high performance processing of large vectors - with less potential
>> overhead.
>> Do we want to support larger vectors directly or ghc should be made
>> smart enough to fuse operations with vector primops performed in
>> parallel into larger vectors/registers for llvm? Do we want to provide
>> access to llvm shufflevector instruction?
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at <mailto:ghc-devs at>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at

More information about the ghc-devs mailing list