[Haskell-cafe] Data.IArray rant

Tue Sep 6 17:41:18 CEST 2011

Jon Fairbairn wrote:
> Roman Leshchinskiy <rl at cse.unsw.edu.au> writes:
>
> No, arrays were not considered to be bad, they were designed
> with parallelism in mind.

I'm not sure how this can be the case if, as discussed below, most array
operations have to go through lists, an inherently sequential data
structure.

>> It's rather that some considered the IArray API to be
>> inadequate most of the time. Really, H98 arrays aren't very
>> good at anything they do. For collective operations, you are
>> supposed to convert the array to a list, work on the list and
>> then convert it back to an array which just seems wrong.
>
> I am unconvinded that this is any more wrong than using a for
> loop in an imperative language.

Oh, definitely. In fact, I consider using a for loop even more wrong, you
really want to use collective operations instead whenever possible. But I
don't think for loops are the right benchmark for ease of use.

> Remember that the lists are
> lazy, so itâ€™s misleading to say â€œconvert the array to a listâ€
> since what happens most of the time is that elements are taken
> out of the array and passed to the processing function and then
> thrown away before the next element is processed.

Efficiency isn't even the biggest problem here. Whenever you want to
perform a collective operation on an array, you have to do the actual work
on an entirely different data structure. So I, as a programmer, have to
say: "Convert this array to a list, then do something with the list and
then convert the resulting list back to an array". To me, at least, this
is very inconvenient and requires a context switch. No other widely used
container type requires you to do that.

>> Multidimensional arrays can't be sliced and diced in the style
>> of Repa or NumPy.
>
> Iâ€™m not familiar with Repa or NumPy, but what can they do that
> cannot be done with judicious use of ixmap, which is a very
> powerful mechanism.

Yes, it is quite powerful but not very convenient. Just as an example, if
A is a matrix, then A[3,:] gives you the 4th row and A[:,3] the 4th column
in NumPy. You can do that with ixmap but it's far more involved.

It also isn't powerful enough since it doesn't support certain highly
useful uses of shape polymorphism (an example is a generic concat which
decreases the dimensionality of any array by 1). This is discussed in
detail in http://www.cse.unsw.edu.au/~chak/papers/repa.pdf.

>> In general, H98 arrays seem to have been designed with the
>> goal of providing a container with O(1) indexing. They do
>> that, I suppose, although they aren't very well integrated
>> with the rest of the language
>
> Can you give examples?

My favourite is the interaction between arrays and Enum. If I want to
implement a generic enumFromTo m n that produces an array, I have to
create the list [m .. n], take its length (which forces the entire list
into memory) and then create an array of that length and fill it from the
list. There doesn't seem to be any other way of doing it. This is a
deficiency of Enum, really, but that's what I mean by not being well
integrated. There are other examples like this.

>> and they have conceptual problems (such as requiring two
>> bounds checks for one array access).
>
> Assuming that you mean that for safe array access where nothing
> is known about the index at compile time, since any sort of
> array has at least a beginning and an end, they all require two
> bounds checks. Once you do know something about the index, itâ€™s
> a question of implementation.

That's not what I meant. You really need to do two full bounds checks, one
against inRange and one against the actual length of the array. Here is
the relevant comment from the GHC libraries:

Note [Double bounds-checking of index values]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When you index an array, a!x, there are two possible bounds checks we
might make:

  (A) Check that (inRange (bounds a) x) holds.

      (A) is checked in the method for 'index'

  (B) Check that (index (bounds a) x) lies in the range 0..n,
      where n is the size of the underlying array

      (B) is checked in the top-level function (!), in safeIndex.

Of course it *should* be the case that (A) holds iff (B) holds, but that
is a property of the particular instances of index, bounds, and inRange,
so GHC cannot guarantee it.

 * If you do (A) and not (B), then you might get a seg-fault,
   by indexing at some bizarre location.  Trac #1610

 * If you do (B) but not (A), you may get no complaint when you index
   an array out of its semantic bounds.  Trac #2120

At various times we have had (A) and not (B), or (B) and not (A); both
led to complaints.  So now we implement *both* checks (Trac #2669).

Roman