[Haskell-cafe] Confused about logBase

Fri Aug 26 05:24:17 UTC 2016

On 26/08/16 1:43 PM, MarLinn via Haskell-Cafe wrote:
>
> Calculating the binary logarithm of an integer is a search for the least
> significant bit. For integers, that's either a build-in or a binary
> search.

The binary log of an integer is close to the location of
the MOST significant bit.  (Common Lisp, see HAULONG.
Smalltalk, see #highBit.)  Some machines have hardware
for "Count Leading Zeros" that can be used to get this.
BUT that's still not the binary logarithm.
4 and 5 have their most significant bit in the same place,
but do not have the same base-2 logarithm.

log2(4) = 2.0000000000000000
log2(5) = 2.3219280948873622

Realistically, the hard part happens *after* you find the
most significant bit.

For floating point values, you already know where that bit is
> because the exponent tells you (modulo some edge cases and an offset).

That's not the hard part.

In the open source libm, log() is 104 SLOC of C,
while log2() is 119 SLOC of C.  In both of them,
extracting the exponent from the floating point
number is indeed just a couple of lines.  It's everything
ELSE that's hard.  Getting below 0.90 ULP of error is not easy.

It's worth noting that these days the standard C library
has log1p() and log() -- both base e -- and also log2 and log10.
But that's recent; the C library used to include just log().

> That should still be faster than even the division operation alone that
> you otherwise need to combine the two values. Divisions are expensive, too.
> (interesting fact: Java uses a seven-fingered merge sort with
> bit-shift-trickery to approximate a division by seven because the
> division is so damn expensive)

Java was designed by a company that left integer division
entirely out of the hardware of their machine (SPARC V7).
While this was fixed in SPARC V8, some experiences leave scars.

For what it's worth, gcc and other C compilers have been
optimising integer divisions by known constants into shifts and
adds/subtracts for a long time.  The bit-shift trickery may well
be an idea whose time has gone.

What this has to do with FLOATING-POINT divisions is not clear.
Looking at all the work that log() -- or log2() -- has to do
makes me think that a Haskell program that could NOTICE an
extra floating-point division would be a very unusual Haskell
program indeed.

> So sorry, I don't believe that claim of yours. I'm not sure about your
> second claim either. Maybe natural logarithms are used more often, maybe
> not. I don't have any numbers either way and I don't read enough papers
> to have a feeling what the scientific community uses.

Natural logarithms have some rather pleasant mathematical
properties.  log10 is nice for human beings, but that's a user
interface issue more than actual calculation.  log2 is useful
if you are calculating "information", but even then, for most
purposes other than user output measuring information in nats
is just as good as measuring in bits.