Storable instance of () is broken

Wed Jan 5 12:09:49 UTC 2022

It is hard to objectively or mathematically prove which option is
better. As a library designer I do not like to think only from the
perspective of my current problem at hand but in general what is the
right thing. In my opinion, the right thing here is to have uniform
semantics with a non-zero size for objects that are stored or
retrieved from memory. If I were the owner of the base package I would
do that. This optimization in my opinion is a micro-optimization which
is irrelevant in the larger scheme of things. If someone wants to
optimise for this case there could be ways to do that. But again it is
subjective - this vs that.

It is a different discussion whether it is a good idea to change the
instance because it might break things. This instance came into being
in GHC 8.0. There must be a reason for that, maybe someone on this
list knows. I wonder how many users there are.

-harendra

On Wed, 5 Jan 2022 at 17:11, David Feuer <david.feuer at gmail.com> wrote:
>
> No. Consider a type like this:
>
> data Foo a = Foo !Int !a
>
> instance Storable a => Storable (Foo a) where ...
>
> Now if a happens to be (), we pay only one word per Foo. You want us to pay more so you can do your calculation more efficiently. That doesn't seem quite fair. You have another option: don't use (). Just write your own version with a different Storable instance and use that. Or use a newtype wrapper like I suggested.
>
> On Wed, Jan 5, 2022, 6:14 AM Harendra Kumar <harendra.kumar at gmail.com> wrote:
>>
>> On Wed, 5 Jan 2022 at 13:44, David Feuer <david.feuer at gmail.com> wrote:
>> >
>> > I don't think it's broken; I think your length calculation is broken. Instead of asking every use of () to take an extra byte, why don't you just store a word saying how long your array is?
>>
>> We can always say that the application is broken and not the
>> underlying system as long the problem can be circumvented in the
>> application. Of course the problem can be circumvented in our code and
>> that is what we are doing, but these semantics for the () instance
>> makes it harder and not very elegant. It is debatable what is broken.
>> That's the reason I wanted to raise this for discussion.
>>
>> We already store the length in the array but the length is in the form
>> of number of bytes and not the number of elements, we store the start
>> and end pointers in memory and compute the byte length from that. And
>> the () Storable instance does not allow us to convert bytes back to
>> the number of elements. And that is actually the crux of the problem.
>>
>> There are reasons for storing pointers which may be irrelevant for
>> this discussion. But the point is that this instance forces us to use
>> a different representation where we have to store the number of
>> elements in the array. Which I think is unnecessary for Storable
>> arrays as long as the Storable type has a concrete representation with
>> a definite size.
>>
>> I would like to think that if something is Storable then it should
>> have a concrete memory representation. peek and poke operations
>> themselves indicate this fundamental nature of Storable semantics.
>> peek means we are reading from a memory location and poke means that
>> we are writing to a memory location. In the case of () we are just
>> pretending to read or write on peek/poke, we are not storing or
>> retrieving something. This in my opinion should not be how Storable
>> should behave.
>>
>> I think this should not be about optimising for that extra byte. That
>> optimization is bogus. We are saying look, we can store something in
>> memory without even consuming any space. What's the point of doing
>> that?  In this case we are doing that just because we can. To take
>> this argument further we can also say that we can compress some byte
>> sequences and it will take up less space. But that's harder and it
>> will completely screw up the simple semantics so we won't do that.
>> Let's screw it only a little bit, because it's easier to do that.
>>
>> If we do not want to consume space for () then better not have a
>> Storable instance for it, leave it to the users. And if we really want
>> to store it then let it behave in the same way as any other mere
>> mortal storable would, so what if it takes up some space. Why not have
>> simple, uniform semantics for all cases? I think saving that extra
>> byte for this particular case is an overkill and leading to a bigger
>> cost in applications without actually saving anything worthwhile.
>>
>> -harendra
>>
>> >
>> > Alternatively, you could probably avoid special cases with a newtype:
>> >
>> > newtype NZS a = NZS { unNZS :: a }
>> >
>> > instance Storable a => Storable (NZS a) where
>> >   sizeof
>> >     | sizeof (undefined @a) == 0
>> >     = const 1
>> >     | otherwise = sizeof .# unNZS
>> >   alignment = alignment .# unNZS
>> >   peekElemOff = coerce (peekElemOff @a)
>> >   ....
>> >
>> > On Wed, Jan 5, 2022, 3:01 AM Harendra Kumar <harendra.kumar at gmail.com> wrote:
>> >>
>> >> The Storable instance of () is defined in the "Foreign.Storable"
>> >> module of the "base" package as follows:
>> >>
>> >> instance Storable () where
>> >>   sizeOf _ = 0
>> >>   alignment _ = 1
>> >>   peek _ = return ()
>> >>   poke _ _ = return ()
>> >>
>> >> The size of () is defined as 0. It sounds absurd for a Storable to
>> >> have a size of 0? This means that we can read an infinite number of ()
>> >> type values out of nothing (no memory location required) or store an
>> >> infinite number of () type values without even requiring a memory
>> >> location to write to.
>> >>
>> >> This is causing a practical problem in our Storable array
>> >> implementation. The array is constrained to a Storable type. Since ()
>> >> has a Storable instance, one can store () in the Storable array. But
>> >> it causes a problem because we determine the array element size using
>> >> sizeOf on the type. For () type it turns out to be 0. Essentially, the
>> >> array of () would always be of size 0. Now, we cannot determine the
>> >> length of the array from its byte length as you could store infinite
>> >> such elements in an empty array. The Storable instance of () seems to
>> >> be an oddity and makes us use a special case everywhere in the code to
>> >> handle this, and this special casing makes it highly prone to errors
>> >> when we change code.
>> >>
>> >> Can this be fixed? Is there a compelling argument to keep it like
>> >> this? A possible fix could be to represent it by a single byte in
>> >> memory which can be discarded when reading or writing. Another
>> >> alternative is to not provide a Storable instance for it at all. Let
>> >> the users write their own if they need it.
>> >>
>> >> If you think this does not have a problem, can you suggest how to
>> >> elegantly handle the array implementation problem as I described
>> >> above?
>> >>
>> >> Thanks,
>> >> Harendra
>> >> _______________________________________________
>> >> Libraries mailing list
>> >> Libraries at haskell.org
>> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries