[Haskell-cafe] bitSize

Fri Aug 26 20:02:59 CEST 2011

On Friday 26 August 2011, 19:24:37, Andrew Coppin wrote:
> On 26/08/2011 02:40 AM, Daniel Peebles wrote:
> > And as Daniel mentioned earlier, it's not at all obvious what we mean
> > by "bits used" when it comes to negative numbers.
> 
> I guess part of the problem is that the documentation asserts that
> bitSize will never depend on its argument. (So would will write things
> like "bitSize undefined :: ThreadID" or similar.)

I don't think that's a problem, it's natural for what bitSize does. And it 
would be bad if bitSize did something different for Integer than for Int, 
Word, ...

> 
> I can think of several possible results one might want from a bit size
> query:

Yup, though for some there are better names than bitSize would be.

> 
> 1. The number of bits of precision which will be kept for values of this
> type. (For Word16, this is 16. For Integer, this is [almost] infinity.)

Not "almost infinity", what your RAM or Int allow, whichever cops out 
first, or "enough, unless you [try to] do really extreme stuff".

> 
> 2. The amount of RAM that this value is using up. (But that would surely
> be measured in bytes, not bits. And processor registors make the picture
> more complicated.)
> 
> 3. The bit count to the most significant bit, ignoring sign.
> 
> 4. The bit count to the sign bit.
> 
> Currently, bitSize implements #1. I'm not especially interested in #2. I
> would usually want #3 or #4.

I'd usually be more interested in #2 than in #4.

> 
> Consider the case of 123 (decimal). The 2s complement representation of
> +123 is
> 
> ...0000000000000001111011
> 
> The 2s complement representation of -123 is
> 
> ...1111111111111110000101
> 
> For query #3, I would expect both +123 and -123 to yield 7.

One could make a case for the answer 3 for -123, I wouldn't know what to 
expect without it being stated in the docs.

> For query
> #4, I would expect both to yield 8. (Since if you truncate both of those
> strings to 8 bits, then the positive value starts with 0, and the
> negative one start with 1.)

#4 would then generally be #3 + 1 for signed types, I think, so not very 
interesting, but for unsigned types?

> 
> Then of course, there's the difference between "count of the bits" and
> "bit index", which one might expect to be zero-based. (So that the Nth
> bit represents 2^N.)

Yes, but that shouldn't be a problem with good names.

So, which of them are useful and important enough to propose for inclusion?