# [Haskell-cafe] How to make Repa do stencils unboxed?

William Yager will.yager at gmail.com
Sat Jul 2 03:34:41 UTC 2016

```Hello all,

TL;DR: Repa appears to be calculating my stencil operation using boxed
arithmetic, which is very slow. How do I make it do unboxed arithmetic?

I am currently working on a project where I need to simulate a 2D
environment that exhibits wave behavior.

I'm accomplishing this by directly discretizing the wave equation and
turning it into a cellular automaton.

The math here is relatively simple. Let's specify the example to the
physical system of a rubber sheet, which exhibits wave behavior. To
simulate a rubber sheet, you divide it up into grids. Then, you keep track
of the velocity and height of each grid.

At each iteration of the simulation, you update the velocity of each grid
based on the force exerted upon it by its neighboring grids. This is simply
a linear multiple of the distance between the grid and its neighbors (i.e.
how stretched the sheet is). For example, if cell x is at height X, and its
neighbors a,b,c,d are at height A,B,C,D, the tension on X is, for some
constant L,

T = L*((A-X)+(B-X)+...) = L*(A+B+C+D-4X).

You can calculate this efficiently via a matrix convolution, in particular
by a convolution with the matrix

[[ 0  1   0
1  -4  1
0  1   0 ]]

You can be more physically accurate by including the diagonals (weighted
down a bit). A popular stencil for such purposes is

[[ 1   2     1
2   -12  2
1   2     1 ]]

Technically the off-diagonal elements should be something like sqrt(2), but
this is sufficient for most purposes.

After calculating the tension matrix, you divide it by the mass of each
grid. This is the acceleration on each grid. Multiply it by the length of
your time step, and you get the change in velocity of each grid. Add this
to the velocity matrix. Then, multiply the velocity matrix by the time
step, and that's the change in height. Add this to the height matrix.

Boom, you've simulated a single time step of a rubber sheet. Keep doing
this over and over to get the time-evolution of the rubber sheet.

My code to do this is as follows:

The imports are messy; sorry about that.

You'll notice I'm using the Dimensional library, which provides statically
typed physical dimensions over any numeric value using a newtype wrapper
with typelits. Hopefully this isn't slowing things down somehow.

According to profiling via "ghc -Odph -rtsopts -threaded -fno-liberate-case
-funfolding-use-threshold1000 -funfolding-keeness-factor1000  -optlo-O3
-fforce-recomp -prof -fprof-auto -fexternal-interpreter RepaSim.hs" (whew),
fully 80% of time and 84% of allocation comes from the stencil operation.

The simulation starts with a rubber sheet with a smooth bump in the middle,
simulates 100ms of time steps, saves a .png file of the sheet, and then
repeats n times (n is typed in via standard input).

How can I make Repa do this stencil unboxed, and are there other things I
should be doing to get better performance?

Thanks,
Will
-------------- next part --------------
An HTML attachment was scrubbed...