[Haskell-cafe] Re: A question about "monad laws"

Thu Feb 14 00:01:17 EST 2008

Richard A. O'Keefe wrote:
> 
> On 14 Feb 2008, at 2:28 pm, Roman Leshchinskiy wrote:
> 
>> Richard A. O'Keefe wrote:
>>> On 12 Feb 2008, at 5:14 pm, jerzy.karczmarczuk at info.unicaen.fr wrote:
>>>> Would you say that *no* typical floating-point software is reliable?
>>> With lots of hedging and clutching of protective amulets around the
>>> word "reliable", of course not.  What I *am* saying is that
>>> (a) it's exceptionally HARD to make reliable because although the 
>>> operations
>>>     are well defined and arguably reasonable they do NOT obey the 
>>> laws that
>>>     school and university mathematics teach us to expect them to obey
>>
>> Ints do not obey those laws, either.
> 
> They obey a heck of a lot more of them.
> Any combination of Ints using (+), (-), (*), and negate
> is going to be congruent to the mathematically correct answer modulo 2**n
> for some n.  In particular, (+) is associative for Ints.

Yes, but neither school nor, for the most part, university mathematics 
teach us to expect modulo arithmetic. Good programmers learn about it at 
some point in their carreer, though, and write their programs 
accordingly. If they intend to use floating point, they should learn 
about it, too.

I do agree that most programmers don't know how to use floats properly 
and aren't even aware that they can be used improperly. But that's an 
educational problem, not a problem with floating point.

> This would be my top priority request for Haskell':
>     require that the default Int type check for overflow on all
>     operations where overflow is possible,
>     provide Int32, Int64 for people who actually *want* wraparound.

I don't understand this. Why use a type which can overflow in the first 
place? Why not use Integer?

>> You just have to check for exceptional conditions.
> 
> Why should it be *MY* job to check for exceptional conditions?

It shouldn't unless you use a type whose contract specifies that it's 
your job to check for them. Which is the case for Int, Float and Double. 
It's not the case for Integer and Rational.

> If you think that, you do not understand floating point.
> x+(y+z) == (x+y)+z fails even though there is nothing exceptional about
> any of the operands or any of the results.

For all practical purposes, the semantics of (==) is not well defined 
for floating point numbers. That's one of the first things I used to 
teach my students about floats: *never* compare them for equality. So in 
my view, your example doesn't fail, it's undefined. That Haskell 
provides (==) for floats is unfortunate.

> I have known a *commercial* program blithely invert a singular matrix
> because of this kind of thing, on hardware where every kind of arithmetic
> exception was reported.  There were no "exceptional conditions", the
> answer was just 100% wrong.

If they used (==) for floats, then they simply didn't know what they 
were doing. The fact that a program is commercial doesn't mean it's any 
good.

>> I guess it trapped on creating denormals. But again, presumably the 
>> reason the student used doubles here was because he wanted his program 
>> to be fast. Had he read just a little bit about floating point, he 
>> would have known that it is *not* fast under certain conditions.
> 
> Well, no.  Close, but no cigar.
> (a) It wasn't denormals, it was underflow.

"Creating denormals" and underflow are equivalent. Denormals are created 
as a result of underflow. A denormalised number is smaller than any 
representable normal number. When the result of an operation is too 
small to be represented by a normal number, IEEE arithmetic will either 
trap or return a denormal, depending on whether underflow is masked or not.

> (b) The fact underflow was handled by trapping to the operating system,
>     which then completed the operating by writing a 0.0 to the appropriate
>     register, is *NOT* a universal property of floating point, and is *NOT*
>     a universal property of IEEE floating point.  It's a fact about that
>     particular architecture, and I happened to have the manual and he 
> didn't.

IIRC, underflow is a standard IEEE exception.

> (c) x*x => 0 when x is small enough *is* fast on a lot of machines.

Only if underflow is masked (which it probably is by default). Although 
I vaguely recall that denormals were/are slower on some architectures.

>> As it were, he seems to have applied what he though was an 
>> optimisation (using floating point) without knowing anything about it. 
>> A professional programmer would get (almost) no sympathy in such a 
>> situation.
> 
> You must be joking.  Almost everybody working with neural nets uses 
> floating point.
>
> [...]
> 
> If you are aware of any neural net software for general purpose hardware 
> done
> by programmers you consider competent that *doesn't* use floating point, I
> would be interested to hear about it.

I'm not. But progammers I consider competent for this particular task 
know how to use floating point. Your student didn't but that's ok for a 
student. He had someone he could ask so hopefully, he'll know next time.

To be clear, I do not mean to imply that programmers who do not know 
about floating point are incompetent. I'm only somewhat sceptical of 
programmers who do not know about it but still write software that 
relies on it.

Roman