The Data.Array.* hierarchy is unsafe (or, Segfaulting for fun and profit)

Wed Dec 6 06:17:40 EST 2006

Simon Peyton-Jones wrote:
>>> To me, the wording "An implementation is entitled to assume..."
>>> implies that there are no obligations on the implementation should
>>> the assumption not hold - no obligation to yield _|_ or any other
>>> behaviour.
>>>
>>>> If "laws not satisfied => any behaviour OK" were the correct
>>>> interpretation, then it would be OK for the Array implementation to
>>>> wipe all your files at the first encounter of a broken Ix law... ;)
>>>
>>> Yup.  That's not quite as bad as in C, where it's ok for an
>>> implementation to wipe all your files if you overflow the int
>>> type...
>>>
>>> Cheers,
>>>         Simon
>>
>> Still, this is pretty bad, and raises questions about the safety of
>> Haskell programs in general.  It seems unsatisfactory that if a
>> programmer makes a mistake in the definition of an 'Ix' instance,
>> then there are no guarantees about the behavior of their program at
>> all...
>
> I rather agree with Iavor here.  If a program makes no use of
> unsafeX functions, and has no foreign calls, and passes the
> typechecker, then it should not crash.

Yes, I agree too!  I'm just pointing out that the problem is already in the Haskell 98 specification, which we follow.  If possible we should fix this in Haskell'.

> However, I don't see how to achieve this for array indexing,
> without adding another test to every array access.

If (!) was changed to be a method of IArray, then for certain arrays we could use unsafeIndex instead of index, and check the index against the physical array size instead.  Eg. (!) is currently defined as

  arr ! i = case bounds arr of (l,u) -> unsafeAt arr (index (l,u) i)

This would be the default method for (!), but for some arrays we could replace it by

  arr ! i = case bounds arr of (l,u) -> safeAt (unsafeIndex (l,u) i)

Where safeAt is implemented by looking at the physical array size.  One slight problem is that the size stored in GHC's byte arrays is rounded up to the nearest word.  Also we don't have a sizeOfArray# primitive, as Spencer pointed out in the original post in the thread.  Similar changes would be required in the MArray class too: readArray/writeArray would need to become methods.

Cheers,
        Simon