[Haskell-cafe] Why is boxed mutable array so slow?
Branimir Maksimovic
bmaxa at hotmail.com
Sun Dec 2 06:57:02 CET 2012
Wow, now performance is on par with Java ;)So slow division was main problem, that and GC .
Thanks!
> From: daniel.is.fischer at googlemail.com
> To: haskell-cafe at haskell.org
> CC: bmaxa at hotmail.com
> Subject: Re: [Haskell-cafe] Why is boxed mutable array so slow?
> Date: Sat, 1 Dec 2012 21:12:29 +0100
>
> On Samstag, 1. Dezember 2012, 16:09:05, Branimir Maksimovic wrote:
> > All in all even unboxed array is about 10 times slower than Java version.
> > I don't understand why is even unboxed array so slow.
>
> It's not the unboxed arrays that are slow.
>
> Your code has a couple of weak spots, and GHC's native code generator has a
> weakness that bites here.
>
> On my box, I don't quite have a 10× difference to my translation to Java, it's
> a bit less than 7× (0.82s vs 0.12s - I don't want to bring my box to its knees
> by running something that takes 3GB+ of RAM, so I run the unboxed array part
> only) with the LLVM backend and 8× (0.93s) with the native code generator.
> That's in the same ballpark, though.
>
> So what's the deal?
>
> Main.main_$s$wa1 [Occ=LoopBreaker]
> :: GHC.Prim.Int#
> -> GHC.Prim.Int#
> -> GHC.Prim.State# GHC.Prim.RealWorld
> -> GHC.Types.Int
> -> GHC.Types.Int
> -> GHC.Types.Int
> -> ...
>
> Your loops carry boxed Ints around, that's always a bad sign. In this case it
> doesn't hurt too much, however, since these values are neither read nor
> substituted during the loop (they're first and last index of the array and
> number of elements). Additionally, they carry an IOUArray constructor around.
> That is unnecessary. Eliminating a couple of dead parameters
>
>
> init' a = do
> (_,n) <- getBounds a
> let init k
> | k > n = return ()
> | otherwise = do
> let x = fromIntegral $ k + k `div` 3
> unsafeWrite a k x
> init (k+1)
> init 0
>
> partial_sum a = do
> (_,n) <- getBounds a
> let ps i s
> | i > n = return ()
> | otherwise = do
> k <- unsafeRead a i
> let l = s + k
> unsafeWrite a i l
> ps (i+1) l
> k <- unsafeRead a 0
> ps 1 k
>
> brings the time for the native code generator down to 0.82s, and for the LLVM
> backend the time remains the same.
>
> Next problem, you're using `div` for the division.
>
> `div` does some checking and potentially fixup (not here, since everything is
> non-negative) after the machine division because `div` is specified to satisfy
>
> a = (a `div` b) * b + (a `mod` b)
>
> with 0 <= a `mod` b < abs b.
>
> That is in itself slower than the pure machine division you get with quot.
>
> So let's see what we get with `quot`.
>
> 0.65s with the native code generator, and 0.13 with the LLVM backend.
>
> Whoops, what's that?
>
> The problem is, as can be seen by manually replacing k `quot` 3 with
>
> (k *2863311531) `shiftR` 33
>
> (requires 64-bit Ints; equivalent in Java: k*28..1L >> 33), when the native
> backend, the LLVM backend and Java (as well as C) all take more or less the
> same time [well, the NCG is a bit slower than the other two, 0.11s, 0.11s,
> 0.14s], that division is a **very** slow operation.
>
> Java and LLVM know how to replace the division by the constant 3 with a
> mulitplication, a couple of shifts and an addition (since we never have
> negative numbers here, just one multiplication and shift suffice, but neither
> Java nor LLVM can do that on their own because it's not guaranteed by the
> type). The native code generator doesn't - not yet.
>
> So the programme spends the majority of the time dividing. The array reads and
> writes are on par with Java's (and, for that matter, C's).
>
> If you make the divisor a parameter instead of a compile time constant, the
> NCG is not affected at all, the LLVM backend gives you equal performance (it
> can't optimise a division by a divisor it doesn't know). Java is at an
> advantage there, after a while the JIT sees that it might be a good idea to
> optimise the division and so its time only trebles.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20121202/c87d56f4/attachment.htm>
More information about the Haskell-Cafe
mailing list