ghc-testsuite-6.6 on Macs

Tue Nov 14 15:30:20 EST 2006

Hello,

On Tuesday 14 November 2006 11:34, Simon Marlow wrote:
> Thorkil Naur wrote:
> > ... I have 
> > produced an experimental darcs patch that solves some problems, while 
> > possibly introducing others: 
> > 
http://thorkilnaur.dk/~tn/GHC/testsuite/patch/barton_mangler_bug_patch_1.patch . 
> > Comments to this would be most welcome.
> 
> For those who haven't looked at Thorkil's patch: he proposes adding some 
code to 
> the testsuite driver to allow sample output specific to a particular way, 
and 
> used this to give sample output for one test (barton-mangler-bug) specific 
to 
> the 'opt' way on PPC.
> 
> I would like it to be the case that any differences at all in the output 
from 
> one way to another are bugs, including floating point differences.

It is a basic question of what varying circumstances we believe should be 
handled by the test framework. I tend to agree with you and, hence, to reject 
my own suggestion of extending the testsuite driver to allow different output 
for different ways..

> On x86_64, 
> we always generate the same results regardless of -fvia-C or -fasm, for 
example. 
>   However, it might be that this isn't practical on all platforms.

I feel rather sure that it isn't.

>   The question  
> is whether we should consider it a *bug* if a test doesn't give consistent 
> floating-point answers or not.  Anyone have any thoughts on this?

Let me put it in this way: It is well known that it is almost always bad to 
test whether two floating point numbers are exactly equal. So in this sense, 
a test whose outcome depends on testing whether two floating point numbers 
are exactly equal is a bad test. (Converting floating point numbers to 
decimal strings and comparing the strings which is what really happens seems 
to make matters even worse.)

To be sure, if we are really testing the floating point operations, we are of 
course entitled to test equality. But if a test does not deal with floating 
point operations as such, but merely includes floating point numbers in its 
output incidentally, the test is probably bad.

I cannot see that the Haskell report specifies precise properties of the 
floating point support, so even implementations that conform to the standard 
(Haskell 98) can be expected to differ. Hence, any test that involves output 
of floating point numbers might produce different output for reasons that are 
entirely unrelated to the test, not a particularly appetizing situation.

Whether difference in floating point results between different ways should be 
considered a bug in GHC, I cannot say. I would tend towards "no", but that is 
probably because I don't have any particular intense interest in floating 
point numbers.

Getting finally to something more specific, my impression until your question 
here had been that the barton-mangler-bug test involved floating point 
numbers incidentally: I imagined that someone (named Barton, perhaps?) ran 
this program at some point in time, discovered some unfortunate behaviour 
(such as the program crashing or producing wild results), that this behaviour 
was traced down to an error in some mangler (the gcc assembler language 
output "post processor"), and that the test was included and maintained in 
the testsuite to ensure that this bug was thouroughly stamped out.

Based on your question, I realised that my impression could be entirely false 
and that the central property tested was precisely the floating point 
differences observed for some ways.

I am still in doubt, so if anyone knows the story behind the 
barton-mangler-bug, I would be delighted to hear it.

> 
> If it's a bug, then we just declare these tests to be expected failures.  If 
> it's not a bug, then we have to allow per-way sample output, as per 
Thorkil's patch.
> 

As I have already mentioned, I think my patch is a mistake. Depending on what 
anyone can tell me about the barton-mangler-bug, additional work would seem 
to go in one of two directions: If the floating point numbers are involved 
incidentally and the mangler bug still threatens, work should attempt to 
remove the floating point numbers from the output and produce a test case 
that exposes the bug more succinctly. I would certainly need some additional 
help to do this. On the other hand, if the floating point difference 
between, e.g., opt and normal is the real issue, it would still seem 
advantageous and quite possible to reduce the size of the test case, to make 
it easier to figure out the cause of the difference.

> Cheers,
> 	Simon
> 

Best regards
Thorkil