[Haskell-cafe] Re: New Hackage category: Error Handling

Mon Dec 7 20:09:40 EST 2009

On Dec 8, 2009, at 1:27 PM, Henning Thielemann wrote:

>
> On Tue, 8 Dec 2009, Richard O'Keefe wrote:
>
>> On Dec 8, 2009, at 12:28 PM, Henning Thielemann wrote:
>>> It is the responsibility of the programmer to choose number types  
>>> that are appropriate for the application. If I address pixels on a  
>>> todays screen I will have to choose at least Word16. On 8-bit  
>>> computers bytes were enough. Thus, this sounds like an error.
>>
>> That kind of attitude might have done very well in the 1960s.
>
> I don't quite understand. If it is not the responsibility of the  
> programmer to choose numbers of the right size, who else?

It is the responsibility of the computer to support the numbers
that the programmer needs.  It is the responsibility of the computer
to compute CORRECTLY with the numbers that it claims to support.

I repeat: the programmer might not KNOW what the integer sizes are.
In Classic C and C89, the size of an 'int' was whatever the compiler
chose to give you.  In c89, limits.h means that the program can find
out at run time what range it got, but that's too late.

>
>
> If the operating system uses Int32 for describing files sizes and  
> Int16 for screen coordinates, I'm safe to do so as well.

In a very trivial sense, this is true.
In any less trivial sense, not hardly.

Suppose I am calculating screen coordinates using

/ x \	/ a11 a12 a13 \  / u \
| y | = | a21 a22 a23 |  | v |
\ 1 /   \  0   0   1  /  \ 1 /

which is a fairly standard kind of transformation,
it is not in general sufficient that u, v, x, and y should all
fit into Int16.  The intermediate results must _also_ fit.
(Assume for the moment that overflow is checked.)

If I need to represent *differences* of Int32 pairs, I know
perfectly well what type I need:  Int33.  But there is no such
type.

> The interface to the operating system could use type synonyms  
> FileSize and ScreenCoordinate that scale with future sizes. But the  
> programmer remains responsible for using ScreenCoordinate actually  
> for coordinates and not for file sizes.

Consider this problem:
    We are given a whole bunch of files, and want to determine
    the total space used by all of them.

Smalltalk:
    fileNames detectSum: [:each | (FileStatus fromFile: each) size]

The answer is correct, however many file names there are in the
collection.  But if we're using C, or Pascal, or something like
that, we want to do
	FileCollectionSize fcs = 0;
	for (i = 0; i < n; i++) {
	    fcs += OS_File_System_Size(filename[i]);
	}
and how on earth do we compute the type FileCollectionSize?
Remember, it has to be big enough to hold the sum of the sizes of
an *unknown* and quite possibly large number of quite possibly
extremely large files, not necessarily confined to a single disc,
so the total could well exceed what will fit in FileSize.
This is especially so when you realise that there might be many
repetitions of the same file names.  I can _easily_ set this up
to overflow 64 bits on a modern machine.

In Haskell, you'd want to switch straight over to Integer and
stay there.

>> In an age when Intel have demonstrated 48 full x86 cores on a single
>> chip, when it's possible to get a single-chip "DSP" with >240 cores
>> that's fast enough to *calculate* MHz radio signals in real time,
>> typical machine-oriented integer sizes run out _really_ fast.
>> For example, a simple counting loop runs out in well under a second
>> using 32-bit integers.
>>
>> The programmer doesn't always have the information necessary to
>> choose machine-oriented integer sizes.  Or it might not offer a  
>> choice.
>> Or the choice the programmer needs might not be available:  if I want
>> to compute sums of products of 64-bit integers, where are the 128-bit
>> integers I need?
>
> And the consequence is to ship a program that raises an exception  
> about problems with the size of integers?

Yes.  Just because the programmer has TRIED to ensure that all the
numbers will fit into the computer's arbitrary and application- 
irrelevant
limits doesn't mean s/he succeeded.  For that matter, it doesn't mean
that the compiler won't use an optimisation that breaks the program.
(Yes, I have known this happen, with integer arithmetic, and recently.)

> I'm afraid I don't understand what you are arguing for.

I'm arguing for three things.
(1) If you look at the CERT web site, you'll discover that there have  
been
     enough security breaches due to integer overflow that they  
recommend
     working very hard to prevent it, and there's an "As If Ranged"  
project
     to enable writing C _as if_ it could be trusted to raise exceptions
     about problems with the size of integers.  It is better to raise an
     exception than to let organised crime take over your computer.
(2) In a language like Ada where the programmer can *say* exactly what  
range
     they need, and the bounds can be computed at compile time, and the
     compiler either does it right or admits right away that it can't,  
it's
     the programmer's fault if it's wrong.  Otherwise it isn't.
(3) Be very worried any time you multiply or divide integers.
     (INT_MIN / (-1) is an overflow, C is allowed to treat INT_MIN %  
(-1)
     as undefined even though it is mathematically 0.)  No, make that
     "be terrified".  If you cannot formally verify that results will be
     in range, use Haskell's Integer type.