Some great results on fused code with the LLVM backend
twhitehead at gmail.com
Thu Feb 25 15:00:11 EST 2010
On February 21, 2010 20:57:25 Don Stewart wrote:
> I tried out some of the vector and uvector fusion benchmarks with the
> new LLVM backend
> and got some great results for the tight loops generated through fusion.
> Up to 2x faster than gcc -O3 in some cases.
I had a quick scan through Davids thesis the other day and noted that he
attributes a lot/at least some of the tight loops performance advantage to not
having pinned the STG registers except at function entrance and exit.
According to what I understand from the bottom of page 42 and top of page 43,
this was done through a custom calling convention whereby the first N arguments
get passed in the N registers assigned to the STG virtual registers, and every
function is extended to take the STG registers as their first N parameters.
The net result is that, on entry to any function (there are only entries to
worry about as everything is a tail call), the STG virtual registers are in
the correct hardware registers, so the RTS is happy.
What is interesting though, is LLVM is free to spill them between function
calls. This can free up more registers for right loops, and from my
understanding of the bottom of page 53 and top of page 54, this was likely
crucial to getting the great tight-loop performance in some cases.
I don't know if this even makes sense to ask, but could the same thing be done
for the native code generator (i.e., implement global RTS registers as a
calling convention instead what I presume is a don't touch approach)?
PS: If you happen to read this list, that was a nice body of work David.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20100225/2dd159a8/attachment.bin
More information about the Glasgow-haskell-users