ghc-testsuite-6.6 on Macs
Thorkil Naur
naur at post11.tele.dk
Tue Nov 14 15:30:20 EST 2006
Hello,
On Tuesday 14 November 2006 11:34, Simon Marlow wrote:
> Thorkil Naur wrote:
> > ... I have
> > produced an experimental darcs patch that solves some problems, while
> > possibly introducing others:
> >
http://thorkilnaur.dk/~tn/GHC/testsuite/patch/barton_mangler_bug_patch_1.patch .
> > Comments to this would be most welcome.
>
> For those who haven't looked at Thorkil's patch: he proposes adding some
code to
> the testsuite driver to allow sample output specific to a particular way,
and
> used this to give sample output for one test (barton-mangler-bug) specific
to
> the 'opt' way on PPC.
>
> I would like it to be the case that any differences at all in the output
from
> one way to another are bugs, including floating point differences.
It is a basic question of what varying circumstances we believe should be
handled by the test framework. I tend to agree with you and, hence, to reject
my own suggestion of extending the testsuite driver to allow different output
for different ways..
> On x86_64,
> we always generate the same results regardless of -fvia-C or -fasm, for
example.
> However, it might be that this isn't practical on all platforms.
I feel rather sure that it isn't.
> The question
> is whether we should consider it a *bug* if a test doesn't give consistent
> floating-point answers or not. Anyone have any thoughts on this?
Let me put it in this way: It is well known that it is almost always bad to
test whether two floating point numbers are exactly equal. So in this sense,
a test whose outcome depends on testing whether two floating point numbers
are exactly equal is a bad test. (Converting floating point numbers to
decimal strings and comparing the strings which is what really happens seems
to make matters even worse.)
To be sure, if we are really testing the floating point operations, we are of
course entitled to test equality. But if a test does not deal with floating
point operations as such, but merely includes floating point numbers in its
output incidentally, the test is probably bad.
I cannot see that the Haskell report specifies precise properties of the
floating point support, so even implementations that conform to the standard
(Haskell 98) can be expected to differ. Hence, any test that involves output
of floating point numbers might produce different output for reasons that are
entirely unrelated to the test, not a particularly appetizing situation.
Whether difference in floating point results between different ways should be
considered a bug in GHC, I cannot say. I would tend towards "no", but that is
probably because I don't have any particular intense interest in floating
point numbers.
Getting finally to something more specific, my impression until your question
here had been that the barton-mangler-bug test involved floating point
numbers incidentally: I imagined that someone (named Barton, perhaps?) ran
this program at some point in time, discovered some unfortunate behaviour
(such as the program crashing or producing wild results), that this behaviour
was traced down to an error in some mangler (the gcc assembler language
output "post processor"), and that the test was included and maintained in
the testsuite to ensure that this bug was thouroughly stamped out.
Based on your question, I realised that my impression could be entirely false
and that the central property tested was precisely the floating point
differences observed for some ways.
I am still in doubt, so if anyone knows the story behind the
barton-mangler-bug, I would be delighted to hear it.
>
> If it's a bug, then we just declare these tests to be expected failures. If
> it's not a bug, then we have to allow per-way sample output, as per
Thorkil's patch.
>
As I have already mentioned, I think my patch is a mistake. Depending on what
anyone can tell me about the barton-mangler-bug, additional work would seem
to go in one of two directions: If the floating point numbers are involved
incidentally and the mangler bug still threatens, work should attempt to
remove the floating point numbers from the output and produce a test case
that exposes the bug more succinctly. I would certainly need some additional
help to do this. On the other hand, if the floating point difference
between, e.g., opt and normal is the real issue, it would still seem
advantageous and quite possible to reduce the size of the test case, to make
it easier to figure out the cause of the difference.
> Cheers,
> Simon
>
Best regards
Thorkil
More information about the Glasgow-haskell-users
mailing list