[Haskell-cafe] Re: A question about "monad laws"

Wed Feb 13 22:36:49 EST 2008

On 14 Feb 2008, at 2:28 pm, Roman Leshchinskiy wrote:

> Richard A. O'Keefe wrote:
>> On 12 Feb 2008, at 5:14 pm, jerzy.karczmarczuk at info.unicaen.fr wrote:
>>> Would you say that *no* typical floating-point software is reliable?
>> With lots of hedging and clutching of protective amulets around the
>> word "reliable", of course not.  What I *am* saying is that
>> (a) it's exceptionally HARD to make reliable because although the  
>> operations
>>     are well defined and arguably reasonable they do NOT obey the  
>> laws that
>>     school and university mathematics teach us to expect them to obey
>
> Ints do not obey those laws, either.

They obey a heck of a lot more of them.
Any combination of Ints using (+), (-), (*), and negate
is going to be congruent to the mathematically correct answer modulo  
2**n
for some n.  In particular, (+) is associative for Ints.

> It is not exceptionally hard to write reliable software using ints.

I did my BSc and MSc computing on a B6700, where the hardware *always*
notified you in case of an integer overflow.  In that case, it was
perfectly easy to write reliable software.  You just pretended that
the type 'INTEGER' in your program meant 'mathematical integer', and if
that got you into trouble, the machine was certain to tell you about it.

Using languages that do not check for integer overflow, even on hardware
(like, as it happens, both the different machines on my desk) that makes
it cheap to do so, I *have* had trouble with multiplying two positive  
integers
and getting a negative rules and also with a program that went into  
an infinite
loop because it happened to multiply two positive numbers and get  
another
positive number that was smaller than the one it started with.   
There's also
the problem dividing two negative integers can give you a negative  
result.
And one problem I definitely ran into was a Pascal 'for' loop with  
positive
bounds that ran forever.

When I contrast the amount of manual checking I have to do when  
writing C
(or, for that matter, Haskell) with the amount of manual checking I  
have to
do when using Smalltalk or SETL or Lisp, and when I remember how life  
was
*better* for me in this respect back in the 70s, well, it doesn't  
make me
happy.

This would be my top priority request for Haskell':
	require that the default Int type check for overflow on all
	operations where overflow is possible,
	provide Int32, Int64 for people who actually *want* wraparound.

I've been told that there was a day when there was serious trouble in  
the
US financial markets because the volume of trade exceeded the 32-bit  
signed
integer limit, and many programs started giving nonsense results.   
But the
Smalltalk programs just kept powering on...

> You just have to check for exceptional conditions.

Why should it be *MY* job to check for exceptional conditions?
That's the *MACHINE*'s job.  When you bought your computer, you paid
for hardware that will do this job for you!

> That's also the case for floating point.

If you think that, you do not understand floating point.
x+(y+z) == (x+y)+z fails even though there is nothing exceptional about
any of the operands or any of the results.

I have known a *commercial* program blithely invert a singular matrix
because of this kind of thing, on hardware where every kind of  
arithmetic
exception was reported.  There were no "exceptional conditions", the
answer was just 100% wrong.

> I guess it trapped on creating denormals. But again, presumably the  
> reason the student used doubles here was because he wanted his  
> program to be fast. Had he read just a little bit about floating  
> point, he would have known that it is *not* fast under certain  
> conditions.

Well, no.  Close, but no cigar.
(a) It wasn't denormals, it was underflow.
(b) The fact underflow was handled by trapping to the operating system,
     which then completed the operating by writing a 0.0 to the  
appropriate
     register, is *NOT* a universal property of floating point, and  
is *NOT*
     a universal property of IEEE floating point.  It's a fact about  
that
     particular architecture, and I happened to have the manual and  
he didn't.
(c) x*x => 0 when x is small enough *is* fast on a lot of machines.

> As it were, he seems to have applied what he though was an  
> optimisation (using floating point) without knowing anything about  
> it. A professional programmer would get (almost) no sympathy in  
> such a situation.

You must be joking.  Almost everybody working with neural nets uses  
floating
point.  (One exception I came across was some people using special  
vector
processor hardware that didn't *have* floating point.  These days,  
you could
probably use a programmable GPU to good effect.)

For neural net calculations, you have to do lots of dot products.
When this example happened, machines were commonly 32 bit, not 64 bit.
So doing the calculations in integers would
  (a) have limited him to 16 bits for the coefficients,
      instead of double's 53.  This might just have been enough of a  
limit
      to prevent learning the kinds of things he wanted his net to  
learn.
      Actually, if you want to do a dot product on n-vectors, you need
      enough bits for n as well.  Suppose you have 100 inputs, then  
you'll
      need 7 bits for that, so you are limited to (31-7)/2 = 12 bits,  
which
      is dangerously low.  (doubles can do precise sums of up to 128  
products
      of coefficients with up to 23 bits).
  (b) integer multiplication was very slow on that machine.  On most  
modern
      machines, integer multiplication is quite slow compared with add,
      because architects look at C benchmarks and conclude that  
multiplication
      isn't important.  So programmers learn to do multiply-heavy  
calculations
      in floating point, so the benchmarks show less integer  
multiplication,
      so the architects let integer multiply get relatively slower,  
and ....
  (c) the sigmoid function *has* to be done in floating point  
arithmetic.
      In fact I *wanted* him to use integer operations so that he could
      exploit the new-at-the-time graphics instructions (think MMX),  
but the
      project foundered on this step.  He couldn't get a workable  
approximation
      of the sigmoid function using integers that didn't kill  
performance.
  (d) Much of the calculation needed for neural nets can be done  
using the
      Basic Linear Algebra Subroutines, which are available in seriously
      tuned form for most modern machines.  If a programmer *didn't* use
      these (floating-point-only) libraries, I would be asking why not.

If you are aware of any neural net software for general purpose  
hardware done
by programmers you consider competent that *doesn't* use floating  
point, I
would be interested to hear about it.