[Haskell-cafe] Fastest saturating add?
Henning Thielemann
lemming at henning-thielemann.de
Sun Jan 14 21:38:01 UTC 2024
On Sun, 14 Jan 2024, amindfv--- via Haskell-Cafe wrote:
> I have a function that's called many (many) times, which involves a
> saturating addition (addition without overflow).
>
> In other words, I'd like ((254 :: Word8) + 10) to equal 255 (which is
> maxBound::Word8), not 8 (which is what you get with overflow).
>
> My current code, without particular regard to performance, is simply:
>
> satAdd :: Word8 -> Word8 -> Word8
> satAdd x y =
> let m = x + y
> in if m < x then 255 else m
>
> This seems like the type of code somebody's already worked on the
> performance for, though I can't find much Haskell-specific in searches
> online.
LLVM is pretty good at optimizing such code snippets. You may first watch
what code it generates. I guess you do not have to apply manual
optimizations. But you might be curious how LLVM translates your code.
The CPU sets a flag when it encounters a carry (aka unsigned overflow).
Then it has an instruction for doing a conditional move. That is, even
without special saturated arithmetic instructions, unsigned saturated
addition can be done in two or three basic instructions.
LLVM also has a saturating add intrinsic:
https://llvm.org/docs/LangRef.html#saturation-arithmetic-intrinsics
x86 provides saturating addition in its SSE and AVX subsystems. They even
allow you to perform up to 64 byte additions in one instruction. I have
looked up that x86 has PADDUSB instruction.
Ok, lets watch LLVM:
$ cat saturated-add.ll
target triple = "x86_64-unknown-linux-gnu"
define i8 @satAdd(i8 %x, i8 %y) {
%m = add i8 %x, %y
%carry = icmp ult i8 %m, %x
%satSum = select i1 %carry, i8 255, i8 %m
ret i8 %satSum
}
$ llc-17 <saturated-add.ll
...
satAdd: # @satAdd
.cfi_startproc
# %bb.0:
addb %sil, %dil
movzbl %dil, %ecx
movl $255, %eax
cmovael %ecx, %eax
retq
...
Now with optimizations:
$ opt-17 -O2 <saturated-add.ll | llvm-dis-17
...
define i8 @satAdd(i8 %x, i8 %y) local_unnamed_addr #0 {
%satSum = tail call i8 @llvm.uadd.sat.i8(i8 %x, i8 %y)
ret i8 %satSum
}
...
declare i8 @llvm.uadd.sat.i8(i8, i8) #1
Thus, it has detected your manual saturated add implementation.
To what instruction will it be translated?
$ opt-17 -O2 <saturated-add.ll | llc-17
...
satAdd: # @satAdd
# %bb.0:
addb %sil, %dil
movzbl %dil, %ecx
movl $255, %eax
cmovael %ecx, %eax
retq
Sadly, it remains the same. No special instructions.
More information about the Haskell-Cafe
mailing list