User defined Ix instances potentially unsafe
Fergus Henderson
fjh@cs.mu.oz.au
Tue, 8 May 2001 02:15:26 +1000
On 07-May-2001, Simon Peyton-Jones <simonpj@microsoft.com> wrote:
> In thinking about this I've realised that there's an assumption in
> the Ix interface that
>
> (*) map index (range (l,u)) = [0..rangeSize (l,u)-1]
...
> The constraint (*) also specifies that 'range' returns subscripts
> in increasing order of index. That seems reasonable, but perhaps
> less important. One could alternatively say
>
> (*a) index (l,u) l = 0
> (*b) index (l,u) u = rangeSize (l,u) - 1
>
> In the spirit of making minimal changes/additions to H98, perhaps (*a)
> and (*b) would be better.
Given that (*a)+(*b) already pin the end-points, I *think* you might as
well go all the way and use (*); I don't *think* the extra flexibility
of (*a)+(*b) would be useful. Given a type T which satisfies (*a)+(*b)
one can defined a new range function which sorts the elements by index
range' b@(l,u) = sortBy compareIndex range b where
compareIndex x y = compare (index b x) (index b y)
and then after renaming range as e.g. unsortedRange and
range' as range, the type will now satisfy (*).
So for Haskell 200X, (*) would certainly be preferable, IMHO.
For Haskell 98, I'm not sure; perhaps it is best to be conservative.
> PS: none of this answers Matt's original question which I can rephrase
> thus:
> if a user provides implementations of index, range etc that do not obey
> the specified invariants, can that crash the system? I would rather the
> answer were 'no'.
I think the answer should be "no, unless the user specified the
appropriate compiler option to disable array bounds checking".
> But that means the implementation would have to check the output of
> 'index' against the size of the array allocated from the supplied bounds.
> Which means that in the common case that 'index' also makes such a
> check there'd be two checks.
If you really want to squeeze the last drops of performance out, then
there's always that compiler option to disable array bounds checking...
> Unless the compiler was clever enough
> to eliminate one of them, which isn't easy since one is at the 'ix'
> type and the other at 'Int'.
One possible way to eliminate them would be to add an extra method called
unchecked_index or __unchecked_index to the Ix class. By default this
would do the same as index
class Ix t where
...
__unchecked_index = index
but you could define the instances for the standard library types
such as Int, Char, etc. so that they just skip the check and return
an out-of-range Int; for array operations such as (!) where you're
going to do a range check on the value returned from index,
you can safely use __unchecked_index instead.
The reason for using a name such as __unchecked_index rather than just
unchecked_index would be for strict compatibility with Haskell 98: to
avoid clashing with user-defined identifiers, you need to use a name
that is reserved for use by the implementation, Unfortunately, unless I
missed something, Haskell 98 does seem to reserved any identifiers at all
for use by the implementation, other than the standard reserved words,
so I think even using a name like `__unchecked_index' here would not be
100% strictly Haskell 98 compatible. I think it would be good enough in
practice, though.
For Haskell 200X, where strict backwards compatibility is not required,
unchecked_index should be introduced as a documented new method for the
Ix class.
--
Fergus Henderson <fjh@cs.mu.oz.au> | "I have always known that the pursuit
| of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.