[Haskell-cafe] Multidimensional generalization of the vector package

Thu May 27 19:01:34 EDT 2010

Hey everyone,

I have been thinking about how to generalize the vector package to multiple dimensions, and I'd like to share my ideas with you all in the hope of having people smarter than me share their thoughts on how it could be improved --- *especially* regarding how to make it efficient!

The standard way to represent a memory layout is to use a stride vector --- i.e., a vector which lets you get the (flat) index in memory of a given element by dotting the stride vector with the multidimensional index vector.  This has the advantage that one can efficiently manipulate it to represent views with arbitrary slices, subranges, subranges plus striping (i.e., take every k-th element), and transpositions.  Basically the only thing it can't do is represent a weird view where we, say, take every index along a particular dimension that is a prime number.

Having said this, it might be useful to abstract the memory layout in order to preserve generality.  Thus, I was thinking that the layout of the multidimensional vector could be represented as a type class with an associated type family:

class MemoryLayout layout where
	type LayoutIndex layout :: *

	indexOf :: layout -> (LayoutIndex layout) -> Int
	firstIndex :: layout -> (LayoutIndex layout) -> Int
	lastIndex :: layout -> (LayoutIndex layout) -> Int
	withinBounds :: layout -> (LayoutIndex layout) -> Bool

... etc.

It is important that we be able to iterate through the array efficiently.  One way that this could be done is through the use of cursors.

class MemoryLayout layout where
	...
	type LayoutCursor layout :: *
	...
	getCursorAt :: layout -> (LayoutIndex layout) -> (LayoutCursor cursor)
	getFirstCursor :: layout -> (LayoutCursor cursor)
	getLastCursor :: layout -> (LayoutCursor cursor)
	indexOfCursor :: layout -> (LayoutCursor cursor) -> Int
	advanceCursor :: layout -> (LayoutCursor cursor) -> (LayoutCursor cursor)

In general these unfortunately cannot be integers because integers don't contain enough information to tell us about where we need to go next inside a non-contiguous array.  The alternative is to use a data structure, but then we may potentially have lots of potentially wasteful allocations, deallocations, and copying going on unless we can somehow cause the cursor to be updated destructively.

Fortunately, much of the time our array will be contiguous, so we can have a special case fast-path for this case:

class MemoryLayout layout where
	...
	layoutIsContiguous :: layout -> Bool
	layoutIsContiguousWhenTraversedInRowMajorOrder :: layout -> Bool
	layoutIsContiguousWhenTraversedInColumnMajorOrder :: layout -> Bool

Then we know that we can just iterate through the array in the order of the indices.  It turns out that it is not hard to compute whether a layout is contiguous efficiently.  For example, in the code that implements cuts and slicing we can actually check to see if the new view is still contiguous and simply cache this in the layout.  For restricted cases of layouts (when there aren't transpositions) it isn't too hard to just scan through the layout and see if it is contiguous.

Another way to iterate through the array is to provide this capability by constructing functions that traverse through the array for us and call some user-supplied function on each element.  This actually can be a lot more efficient (since by using recursive algorithms it can store the information that it needs on the stack rather than using heap allocations) but it is much more restrictive, and it requires special cases for traversing multiple arrays in parallel.

Probably the most general way to sum/fold/etc. over particular dimensions (i.e., given a 4D vector, sum over the first and third dimensions leaving the second and fourth dimensions) is simply to generalize the notion of cursors, so that rather than iterating over elements we are iterating over views of the array.

class (MemoryLayout layout1, MemoryLayout layout2) => MemoryLayoutSubviewable layout1 layout2 where
	type MemoryLayoutViewSetSlicer layout1 layout2 :: *
	type MemoryLayoutViewSet layout1 layout2 :: *

	getViewSet :: (MemoryLayoutViewSetSlicer layout1 layout2) -> layout1 -> (MemoryLayoutViewSet layout1 layout2)
	advanceViewSet :: (MemoryLayoutViewSet layout1 layout2) -> (MemoryLayoutViewSet layout1 layout2)
	currentView :: MemoryLayoutViewSet layout1 layout2 -> layout2

This way we can hand getViewSet some "slicer" that selects some dimensions to be reduced but not others and get back a set of views that we can iterate over, providing a generalization of the sum and fold functionality provided by repa.

Although these framework allows us to stay general, in practice working with shapes that are fixed-length vectors (I personally would prefer the Vec package over the hand-rolled one in repa) tends to be easiest, because it allows one to write recursive algorithms employing type families that can define views for arrays with arbitrary dimensions.  Also, in practice one can construct type aliases any helper functions such as

i1 a = a :. ()
i2 a b = a :. b :. ()
i3 a b c = a:. b:. c:. ()
.
.
.

type I0 = ()
type I1 = Int : I0
type I2 = Int :. I1
.
.
.

etc. so that we construct values via (i2 x y) rather than (x :. y :. ()).

Any thoughts?

Cheers,
Greg