Vector primops sizes

Wed Feb 13 06:06:51 CET 2013

Recently merged vector primops support only 16 bytes operands - Int32
x 4, Double x 2 and so on. Current AVX instructions support 256 bit
operands and with simple cut'n'paste work it's possible to support at
least Double x 4 operands. I made those changes and GHC generates
(using llvm) proper AVX code using ymm registers. Also it might make
sense to support primops for vector types larger than any currently
supported primitive types - I have those changes in my branch as well
and llvm generates pretty good code as well - those changes might be
useful to provide access for llvm shufflevector instruction or writing
high performance processing of large vectors - with less potential
overhead.

Do we want to support larger vectors directly or ghc should be made
smart enough to fuse operations with vector primops performed in
parallel into larger vectors/registers for llvm? Do we want to provide
access to llvm shufflevector instruction?