Storable instance of () is broken

Wed Jan 5 11:14:32 UTC 2022

On Wed, 5 Jan 2022 at 13:44, David Feuer <david.feuer at gmail.com> wrote:
>
> I don't think it's broken; I think your length calculation is broken. Instead of asking every use of () to take an extra byte, why don't you just store a word saying how long your array is?

We can always say that the application is broken and not the
underlying system as long the problem can be circumvented in the
application. Of course the problem can be circumvented in our code and
that is what we are doing, but these semantics for the () instance
makes it harder and not very elegant. It is debatable what is broken.
That's the reason I wanted to raise this for discussion.

We already store the length in the array but the length is in the form
of number of bytes and not the number of elements, we store the start
and end pointers in memory and compute the byte length from that. And
the () Storable instance does not allow us to convert bytes back to
the number of elements. And that is actually the crux of the problem.

There are reasons for storing pointers which may be irrelevant for
this discussion. But the point is that this instance forces us to use
a different representation where we have to store the number of
elements in the array. Which I think is unnecessary for Storable
arrays as long as the Storable type has a concrete representation with
a definite size.

I would like to think that if something is Storable then it should
have a concrete memory representation. peek and poke operations
themselves indicate this fundamental nature of Storable semantics.
peek means we are reading from a memory location and poke means that
we are writing to a memory location. In the case of () we are just
pretending to read or write on peek/poke, we are not storing or
retrieving something. This in my opinion should not be how Storable
should behave.

I think this should not be about optimising for that extra byte. That
optimization is bogus. We are saying look, we can store something in
memory without even consuming any space. What's the point of doing
that?  In this case we are doing that just because we can. To take
this argument further we can also say that we can compress some byte
sequences and it will take up less space. But that's harder and it
will completely screw up the simple semantics so we won't do that.
Let's screw it only a little bit, because it's easier to do that.

If we do not want to consume space for () then better not have a
Storable instance for it, leave it to the users. And if we really want
to store it then let it behave in the same way as any other mere
mortal storable would, so what if it takes up some space. Why not have
simple, uniform semantics for all cases? I think saving that extra
byte for this particular case is an overkill and leading to a bigger
cost in applications without actually saving anything worthwhile.

-harendra

>
> Alternatively, you could probably avoid special cases with a newtype:
>
> newtype NZS a = NZS { unNZS :: a }
>
> instance Storable a => Storable (NZS a) where
>   sizeof
>     | sizeof (undefined @a) == 0
>     = const 1
>     | otherwise = sizeof .# unNZS
>   alignment = alignment .# unNZS
>   peekElemOff = coerce (peekElemOff @a)
>   ....
>
> On Wed, Jan 5, 2022, 3:01 AM Harendra Kumar <harendra.kumar at gmail.com> wrote:
>>
>> The Storable instance of () is defined in the "Foreign.Storable"
>> module of the "base" package as follows:
>>
>> instance Storable () where
>>   sizeOf _ = 0
>>   alignment _ = 1
>>   peek _ = return ()
>>   poke _ _ = return ()
>>
>> The size of () is defined as 0. It sounds absurd for a Storable to
>> have a size of 0? This means that we can read an infinite number of ()
>> type values out of nothing (no memory location required) or store an
>> infinite number of () type values without even requiring a memory
>> location to write to.
>>
>> This is causing a practical problem in our Storable array
>> implementation. The array is constrained to a Storable type. Since ()
>> has a Storable instance, one can store () in the Storable array. But
>> it causes a problem because we determine the array element size using
>> sizeOf on the type. For () type it turns out to be 0. Essentially, the
>> array of () would always be of size 0. Now, we cannot determine the
>> length of the array from its byte length as you could store infinite
>> such elements in an empty array. The Storable instance of () seems to
>> be an oddity and makes us use a special case everywhere in the code to
>> handle this, and this special casing makes it highly prone to errors
>> when we change code.
>>
>> Can this be fixed? Is there a compelling argument to keep it like
>> this? A possible fix could be to represent it by a single byte in
>> memory which can be discarded when reading or writing. Another
>> alternative is to not provide a Storable instance for it at all. Let
>> the users write their own if they need it.
>>
>> If you think this does not have a problem, can you suggest how to
>> elegantly handle the array implementation problem as I described
>> above?
>>
>> Thanks,
>> Harendra
>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries