User defined Ix instances potentially unsafe

Tue, 8 May 2001 02:15:26 +1000

On 07-May-2001, Simon Peyton-Jones <simonpj@microsoft.com> wrote:
> In thinking about this I've realised that there's an assumption in 
> the Ix interface that
> 
> (*)	map index (range (l,u)) = [0..rangeSize (l,u)-1]
...
> The constraint (*) also specifies that 'range' returns subscripts
> in increasing order of index.  That seems reasonable, but perhaps
> less important.  One could alternatively say 
> 
> (*a)	index (l,u) l = 0
> (*b)	index (l,u) u = rangeSize (l,u) - 1
> 
> In the spirit of making minimal changes/additions to H98, perhaps (*a)
> and (*b) would be better.

Given that (*a)+(*b) already pin the end-points, I *think* you might as
well go all the way and use (*); I don't *think* the extra flexibility
of (*a)+(*b) would be useful.  Given a type T which satisfies (*a)+(*b)
one can defined a new range function which sorts the elements by index

	range' b@(l,u) = sortBy compareIndex range b where
		compareIndex x y = compare (index b x) (index b y)

and then after renaming range as e.g. unsortedRange and
range' as range, the type will now satisfy (*).

So for Haskell 200X, (*) would certainly be preferable, IMHO.
For Haskell 98, I'm not sure; perhaps it is best to be conservative.

> PS: none of this answers Matt's original question which I can rephrase
> thus:
> if a user provides implementations of index, range etc that do not obey
> the specified invariants, can that crash the system?  I would rather the
> answer were 'no'.

I think the answer should be "no, unless the user specified the
appropriate compiler option to disable array bounds checking".

> But that means the implementation would have to check the output of
> 'index' against the size of the array allocated from the supplied bounds.
> Which means that in the common case that 'index' also makes such a
> check there'd be two  checks.

If you really want to squeeze the last drops of performance out, then
there's always that compiler option to disable array bounds checking...

> Unless the compiler was clever enough
> to eliminate one of them, which isn't easy since one is at the 'ix'
> type and the other at 'Int'.

One possible way to eliminate them would be to add an extra method called
unchecked_index or __unchecked_index to the Ix class.  By default this
would do the same as index

	class Ix t where
		...
		__unchecked_index = index

but you could define the instances for the standard library types
such as Int, Char, etc. so that they just skip the check and return
an out-of-range Int; for array operations such as (!) where you're
going to do a range check on the value returned from index,
you can safely use __unchecked_index instead.

The reason for using a name such as __unchecked_index rather than just
unchecked_index would be for strict compatibility with Haskell 98: to
avoid clashing with user-defined identifiers, you need to use a name
that is reserved for use by the implementation, Unfortunately, unless I
missed something, Haskell 98 does seem to reserved any identifiers at all
for use by the implementation, other than the standard reserved words,
so I think even using a name like `__unchecked_index' here would not be
100% strictly Haskell 98 compatible.  I think it would be good enough in
practice, though.

For Haskell 200X, where strict backwards compatibility is not required,
unchecked_index should be introduced as a documented new method for the
Ix class.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.