ANN: HIntegerByInt

Mon Aug 13 19:15:29 EDT 2007

Stefan O'Rear wrote:
> On Mon, Aug 13, 2007 at 03:21:54PM -0700, John Meacham wrote:
>> On Sat, Aug 11, 2007 at 12:09:44PM -0300, Isaac Dupree wrote:
>>> BTW: I estimate that it took me about two solid weeks of time, more than 
>>> I had originally intended to spend :)
>>>
>>>
>>> I think using CPP will prove important, but makes it harder to test in 
>>> Hugs... should I use Cabal if I want to use Hugs with haskell+cpp code?
>> I think if you were willing to use what is provided by the FFI
>> extension, you could make some signifigant improvements without
>> resorting to CPP. mainly, you could use unsigned arithmetic and have
>> access to bit operations.
>>
>> I was thinking a representation like
>>
>> data Integer = Integer !Bool Rest
>> data Rest = Digit !Word Rest | End
>>
>> where the Bool indicates the sign, and the rest are the base-wordsize
>> digits. It would also be possible to just use a sign bit in the number
>> of course. that unboxed strict Word should make a big difference.
> 
> You might also try
> 
> data Integer = Pos !Word | Neg !Word | Big !Word Integer
> 
> since the incremental penalty of 3 constructors is quite small, and it
> eliminates a few indirections.
> 
> (Note: You have to actually specify -funbox-strict-fields for the ! to
> do much good.)

I use the {-# UNPACK #-} pragma where I want that effect.

Simpler representation = simpler for a compiler to output. (although I 
could easily provide a Haskell module containing (Integer -> (that 
thing's shape) ) for compilers to include.)  No-FFI is particularly 
important (this is NOT GMP), but I don't think I need FFI - the compiler 
just has to support it somehow - to use Data.Word.

I do suspect that storing in two's complement style / with separate 
sign, might be more efficient for some operations.  (I'm particularly 
concerned that small numbers should be fast, so that Int doesn't have to 
be used all over the place just for speed reasons. Since my format 
requires two "case"s to determine that an Integer is a small number, 
_if_ that is the most costly operation, it might be improved)  And of 
course any of you can try the variant implementations you propose, if 
you want to put in the work on it!  I'm not interested for a long time.

The more interesting-looking part to me now is seeing what a good 
mechanism is for using these implementations in compilers is (which is 
not entirely obvious to me).  I'm

BTW. base-wordsize digits are not very good for addition speed, 
depending... Even GMP uses a somewhat smaller base than that, I believe.

Look:
* The compiler only has to output lists (it already must do string 
literals), and Ints, in compiled programs.
* No types outside Haskell98 need to be implemented.
* The Int does not need to be a certain large size, does not need to be 
two's-complement bits (except for the extension class Bits which seems 
to assume two's complement implicitly for signed numbers), and does not 
need to have any particular overflow behavior.

I'm not the one with the energy to make compromises between performance 
and portability.  Seeing something like this in Jhc should be 
interesting and there is where decent-good performance will be wanted. 
I know my implementation is quite inferior performance to GMP, in _GHC_ 
:)  I still think GMP is the place to go for the highest, most 
well-rounded performance. (Although "We will release 4.2.2 in spring 
2007." did not come to pass, and still appears on gmplib.org)

I wonder if there's any way (Jhc) compiler optimizations could transform 
a strict inductive data into a UArray sort of representation, since that 
representation is probably more efficient for Integers...

Isaac