[Haskell-cafe] Arrays and image processing

Wed Aug 4 00:49:09 EDT 2010

Hi!

I need to do some image processing for an algorithm. It involves two (well, 
three) different kinds of arrays. The result array I'm happy to leave as a 
Data.Array for now, since it tends to be bulk updated, and involves Either 
with tuples of tuples, and it's easy enough to switch to mutable without a 
significant impact. So far, using a mutable result array hasn't resulted in a 
significant speed difference, since it's not directly involved in the inner 
loops.

But the static/immutable input arrays, the source images, are more of a 
present concern, since they are so directly involved with the inner loops, and 
are expected to be rather large. 

Right now, the algorithm is stable enough that it's time to move beyond black 
and white, which were well enough served by UArray and some ugly optimized 
inner functions which look rather like C loops, sigh.

But now I need a fast multidimensional array which can handle a tuple of 
primitive types.

My options, as far an I can see, are:

1) Add an instance for UArray (Int,Int) (Word16, Word16, Word16)
and/or UArray (Int,Int) (Word8, Word8, Word8)

2) Add a multidimensional wrapper for Data.Vector

3) Use Data.Array.Repa. While in the currently released form, as opposed to 
the paper, it sadly lacks support for pairs, it's not too hard to coerce it 
into accepting :*:. Though, that seems to involve, for color in this case, 
three separate lookups. As a note, Repa is really quite lovely, but in my 
case/opinion, not quite there yet. I need folds/maps from one type to another 
unfortunately. On the one hand, I might be able to rewrite the statistical 
functions in a numerically stable integral and repa friendly form with a bit 
of thought. Though, that sort of feels like I'm programming in APL instead of 
Haskell, it shouldn't be so headache inducing. On the other hand, I don't 
really need parallelism, as my development laptop only has one cpu. Though the 
algorithm is eventually going to be translated to C and OpenCL for use on an 
embedded device. Still, this now involves even more delving into DPH.

4) Use DPH directly. While I don't really need data parallelism--for the 
development at least--this seems like the most accommodating option for my 
needs. But I worry about the stability and correctness.

I really like the automatic loop fusion that Repa provides. It really goes a 
long way toward making the code simple and nice.

I'm tempted to just start recoding the algorithm in C. It has to be done 
eventually, and it's kind of annoying that something which is trivial in C is 
such a headache here...

Thanks for any help/pointers.

Cheers,

kirstin

-- 
The end move in politics is always to pick up a gun.
		-- Buckminster Fuller