GHC.Prim.ByteArray# - confusing documentation

Sat Dec 26 12:50:25 EST 2009

On Thu, 2009-12-24 at 18:18 -0500, Antoine Latter wrote:
> Folks,
> 
> I found some of the documentation in GHC.Prim confusing - so I thought
> I'd share. The documentation for the ByteArray# type[1] explains
> that's it's a raw region in memory that also remembers it's size.
> 
> Consequently I expected sizeOfByteArray# to return the same number
> that I passed in to newByteArray#. But it doesn't - It returned
> however much it decided to allocate, which on my platform is always a
> multiple of four bytes.

Yes, this is an artefact of the fact that ghc measures heap stuff in
units of words.

> This is something which could be clarified in the documentation.

It would be jolly useful for making short strings for GHC's ByteArray#
to to use a byte length rather than a word length. It'd mean a little
more bit twiddling in the GC code that looks at ByteArray#s, however
it'd save an extra 2 words in a short string type (or allow us to store
'\0' characters in short strings).

It's been on my TODO list for some time to design a portable low level
ByteArray module that could be implemented by hugs, nhc, ghc, etc. The
aim would be to be similar to ForeignPtr + Storable but using native
heap allocated memory blocks.

In turn this would be the right portable layer on which to build
ByteString, Text and probably IO buffers too.

Duncan