Vector primops sizes
Michael Baikov
manpacket at gmail.com
Wed Feb 13 07:19:58 CET 2013
> By which I mean having this family of proposed primops. Its not obvious to
> me at least how GHC could intelligently infer / use these implicitly for
> the end user / library writer.
I have couple of ideas how to implement this, but having explicit set
of primops will make using of the vector instructions less magical.
As for having only valid set of primops for given arch/CPU target will
make things much more complicated - llvm takes care of implementing
vector operation from smaller instructions - operations DoubleX16
primitive types gets compiled into something like
plusDoubleX16# :: DoubleX16# -> DoubleX16# -> DoubleX16#
movq %r13, 616(%rsp)
movq %rbp, 608(%rsp)
movq %r12, 600(%rsp)
movq %rbx, 592(%rsp)
movq %r15, 544(%rsp)
movq 592(%rsp), %rax
movq %rax, 344(%rsp)
movq 608(%rsp), %rax
vmovups (%rax), %ymm0
vmovups 32(%rax), %ymm1
vmovups 64(%rax), %ymm2
vmovups 96(%rax), %ymm3
vmovaps %ymm3, 224(%rsp)
vmovaps %ymm2, 192(%rsp)
vmovaps %ymm1, 160(%rsp)
vmovaps %ymm0, 128(%rsp)
movq 608(%rsp), %rax
vmovups 128(%rax), %ymm0
vmovups 160(%rax), %ymm1
vmovups 192(%rax), %ymm2
vmovups 224(%rax), %ymm3
vmovaps %ymm3, 96(%rsp)
vmovaps %ymm2, 64(%rsp)
vmovaps %ymm1, 32(%rsp)
vmovaps %ymm0, (%rsp)
movq 344(%rsp), %rbx
movq %rbx, 592(%rsp)
movq 544(%rsp), %r15
movq 600(%rsp), %r12
movq 608(%rsp), %rax
movq 616(%rsp), %r13
movq %rax, %rbp
vzeroupper
(Still it should be possible to compile this with less amount of movements)
More information about the ghc-devs
mailing list