[Haskell] Fixed-length vectors in Haskell

Fri May 13 05:38:39 EDT 2005

Hi all

Oleg, it seems to me that you're blurring two distinctions here.

   (i)  fixed-length versus unknown length
   (ii) cons-list versus array

There's no good reason why either the dependent-style vector

> David Menendez wrote:
> ] data Vec n a where
> ]     Nil  ::                 Vec Zero a
> ]     Cons :: a -> Vec n a -> Vec (Succ n) a

or the class-of-constructors representation (strangely familiar to me)

> ] data Vec_0   a = Nil
> ] data Vec_s v a = Cons a (v a)
> 
> S.M.Kahrs wrote:
> ] data Cons b a = a :. b a
> ] data NIL a = NIL

should eat any more memory or be harder to access than regular cons-lists.
The length information is entirely static. This representation precisely
does not suffer from the problem you quote with respect to nested types.

Nested types such as the AVL trees in

 > http://www.haskell.org/pipermail/haskell/2003-April/011693.html

and the square matrix types suffer from the fact that their constructors
must target all parameters uniformly, even if the recursive references
are more specialised. You can only make the parameter bigger by going under
a constructor, so if you want to make structures with size invariants, you get
this prefix to build up the size which eats space and time. Here's a unary
version of square matrices:

   data Nil a = Nil
   data Cons b a = a :. b a

   newtype Square a = Count Nil a
   data Count f a = Stop (f (f a)) | Step (Count (Cons f) a)

As you can see, it's the Stop and Step which take up the extra space,
recording the size of the matrix, and also concealing it from the type.
This structure represents square matrices of any size, ie

   exists n. Vec (Vec n a)

with the Steps and Stop representing the n, and then the vector of
vectors stored as sized cons-lists. If you want to write a multiplier
for matrices, you need to expose these sizes, and that's where type
classes help a bit and indexed datatype families help a lot.

For the same reason, it's easy to define the well-scoped lambda terms (Bird &
Paterson 1999), because scope only gets bigger,

   data Tm x = Var x | App (Tm x) (Tm x) | Lam (Tm (Maybe x))

but trickier to define the simply-typed lambda terms, where it helps to
be able to say 'lambda makes (s -> t)' as you can with datatype
families (Altenkirch & Reus, 1999) and now with GADTs. It's great to see
this stuff making it into Haskell. There are plenty more examples in the
literature that should shift over with minimal fuss.

Anyhow, when you say

> to take the 1023d element of a 1024-element
> vector, we have to traverse 1023 Cons cells.

that's clearly to do with the cons-list representation, not the fixed length.
Of course, I agree with you that the array representation of sequences is
sometimes more appropriate than the cons-list representation, but the reverse
is also true. It's good to have fixed-length versions of both.

> The alternative is to encode the invariant -- the vector size, in our
> problem -- in *type* constructors, in phantom types.  For example,
> 
>>newtype PVector n a = PVector (Array Int a)

Vectors store their lengths implicitly, the way lists do; don't arrays store
their bounds explicitly? But you're right, there's no reason to believe that
the bound corresponds to the n...

> So, we have to hide PVector in a module and
> expose only functions that build the values of PVector one way or the
> other. We also have to make sure that these functions are correct --
> because they constitute the trusted kernel of our invariant.

Indeed we do, so we shall certainly have to work with sizes in a principled
way inside the abstraction barrier, eg tabulation of Ashley's functions from
finite sets (another traditional dependent vector representation), or
whatever. Especially if we want to justify (at least to ourselves) the use
of non-bound-checked accesses, which is surely part of the point here.

My point is just that the issue between inductive Vec and array PVector
is just the cons-list versus array issue. It's not about how one works with
sizes. In either approach, one inevitably comes up against the fact that
sizes are numbers, ie a kind of data, and one would like them to behave
accordingly. How hard the language makes it to treat these numbers like
data is the measure of how well equipped it is to support this precise style
of programming. The existence of crafty encodings cannot, in the long run,
rescue an inappropriate design.

All the best

Conor

-- 
http://www.cs.nott.ac.uk/~ctm