Question about negative Integers

Sylvain Henry sylvain at haskus.fr
Fri Nov 15 16:04:25 UTC 2019


Hi GHC devs,

As some of you may know, I am working on fixing several longstanding 
issues with GHC's big numbers implementation (Integer, Natural). You can 
read more about it here: 
https://gitlab.haskell.org/hsyl20/ghc/raw/hsyl20-integer/libraries/ghc-bignum/docs/ghc-bignum.rst

To summarize, we would have a single `ghc-bignum` package with different 
backends (GMP, pure Haskell, etc.). The backend is chosen with a Cabal 
flag and new backends are way easier to add. All the backends use the 
same representation which allows Integer and Natural types and datacons 
to be wired-in which has a lot of nice consequences (remove some 
dependency hacks in base package, make GHC agnostic of the backend used, 
etc.).

A major roadblock in previous attempts was that integer-simple doesn't 
use the same representations for numbers as integer-gmp. But I have 
written a new pure Haskell implementation which happens to be faster 
than integer-simple (see perf results in the document linked above) and 
that uses the common representation (similar to what was used in 
integer-gmp).

I am very close to submit a merge request but there is a remaining 
question about the Bits instance for negative Integer numbers:

We don't store big negative Integer using two's complement encoding, 
instead we use signed magnitude representation (i.e. we use constructors 
to distinguish between (big) positive or negative numbers). It's already 
true today in integer-simple and integer-gmp. However integer-gmp and 
integer-simple fake two's complement encoding for Bits operations. As a 
consequence, every Bits operation on negative Integers does *a lot* of 
stuff. E.g. testing a single bit with `testBit` is linear in the size of 
the number, a logical `and` between two numbers involves additions and 
subtractions, etc.

Question is: do we need/want to keep this behavior? There is nothing in 
the report that says that Integer's Bits instance has to mimic two's 
complement encoding. What's the point of slowly accessing a fake 
representation instead of the actual one? Could we deprecate this? The 
instance isn't even coherent: popCount returns the negated numbers of 1s 
in the absolute value as it can't return an infinite value.

Thanks,
Sylvain


More information about the ghc-devs mailing list