[Haskell-cafe] Profiling slow multiplication of elements read from vector
Richard Janis Beckert
janis.beckert at gmail.com
Wed Dec 12 15:56:14 CET 2012
Hey! Thanks a lot for your reply!
> Disclaimer: I am compiling GHC 7.6.1-rc1 while testing this, so my
> measurements might be unreliable. Best try it out yourself.
>
> Also, this blind-stab-optimization is /not/ best practice, I just
> enjoyed fiddling around.
What /is/ best practice in regards to performance optimization of code
such as mine? What other options than blind stabbing do I have?
As for applying your suggestions: I have improved the speed of my
overall program by about 15% (see the new profiling I have obtained),
but it still seems as if the program spends too much time reading from
the vector and multiplying 5 integers --- but probably the cost centers
are allocated wrongly?
I tried various combinations of the changes you suggest, and had some
very interesting results.
It seems like there are some performance differences using GHC
7.4.2 (which I use due to cabal-dev) and GHC 7.6.*.
Using my initial code, I get a runtime of around 15.7s.
On 12/11/12 at 10:43pm, Joachim Breitner wrote:
> neighProd vs l !i = lukUp vs l (i-2) * lukUp vs l (i-1) *
> lukUp vs l (i) * lukUp vs l (i+1) * lukUp vs l (i+2)
>
Using that code rather than a map, actually leads to slower results! I
am getting an average of 18.0s for this.
Interestingly, passing the index `!i' as an unboxed Int leads to a
time-increase in my map version --- 16.8s.
However:
> Using V.unsafeIndex there gives another noticable speedup.
This helps a lot, giving me 14.8s and 15.1 for your suggestion; but
combining it with
> lukUp :: V.Vector Int -> Int -> Int -> Int
> lukUp vs l i = let !i' = i `mod` l
> in V.unsafeIndex vs i'
>
> did, according to core, prevent some boxing, but did not speed up the
> program. Allocating Ints that are unused before the next GC are very
> cheap.
gives me a time of about 12.8s with your suggestions (explicit calls to
`lukUp', unsafeIndex'ing), which is the fastest result I got so far!
> You can save more time by not passing l around, the size of the vector
> is already present in vs and can cheaply be accessed by "V.length vs".
I decided to pass l around explicitly in order to safe on expensive
length-calculations. As it turns out, checking the bounds gives me a
decrease in speed in all cases.
Furthermore, using `mod' is actually a feature of
my underlying model, as it enforces periodic boundary conditions (think
of the vector representing a circle). Therefore the boundary checking
just happens accidentally.
> Oh, and you are creating a new byte array in every invocation of
> randVec. That is of course expensive. Using mutable arrays seems to be a
> more natural choice. Or is that really what you want to do? Judging from
> your code I would expect that you want to create a new array consisting
> only of values returned by neighProd; with the current code, when
> updating the second value you are including the _updated_ first entry in
> your product.
I will try to use a mutable vector to see if I can speed the thing up
further, since I do indeed want to update my vector one element at a
time, taking into account the elements 4 nearest and next-to-nearest
neighbours (as you can see, every iteration of the code acts on the
updated vector).
I just decided to go with an unmutable structure (a list at first,
followed by a Data.Map structure), in order to work without side
effects.
I'd be happy to get more input about how to increase the performance of
my code!
Cheers
Janis
-------------- next part --------------
Wed Dec 12 15:42 2012 Time and Allocation Profiling Report (Final)
vec +RTS -p -RTS
total time = 12.46 secs (12463 ticks @ 1000 us, 1 processor)
total alloc = 8,080,059,488 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
neighProd Main 27.7 5.9
iter Main 25.1 9.9
randVec.vs' Main 20.5 84.2
lukUp Main 15.2 0.0
randVec Main 11.5 0.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 66 0 0.0 0.0 100.0 100.0
main Main 133 0 0.0 0.0 100.0 100.0
main.vs Main 141 1 0.0 0.0 0.0 0.0
iter Main 134 10000001 25.1 9.9 100.0 100.0
lukUp Main 138 10000000 0.0 0.0 0.0 0.0
neighProd Main 136 10000000 0.0 0.0 0.0 0.0
randVec Main 135 10000000 11.5 0.0 74.9 90.1
randVec.vs' Main 143 10000000 20.5 84.2 20.5 84.2
neighProd Main 139 0 27.7 5.9 42.8 5.9
lukUp Main 140 40000000 15.2 0.0 15.2 0.0
CAF Main 131 0 0.0 0.0 0.0 0.0
main Main 132 1 0.0 0.0 0.0 0.0
main.vs Main 142 0 0.0 0.0 0.0 0.0
main.l Main 137 1 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 112 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 110 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 96 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 93 0 0.0 0.0 0.0 0.0
CAF GHC.Integer.Logarithms.Internals 74 0 0.0 0.0 0.0 0.0
More information about the Haskell-Cafe
mailing list