[Haskell-cafe] Can Haskell outperform C++?

Fri May 18 18:38:29 CEST 2012

>> There was and is no claim that method 2 is "much harder
>> to implement in C or C++".  In fact both methods *were* implemented
>> easily in C.
>
> OK, got that now. So Haskell doesn't have a *big* advantage over C w/r
> to the ease of implementation of both algorithms?

In the case of these specific algorithms, no.
In the case of other backtracking algorithms, it could have
quite a big advantage.
>
>> The claim was and remains solely that
>> THE TIME DIFFERENCE BETWEEN *ALGORITHMS*
>>   can be bigger than
>> THE TIME DIFFERENCE BETWEEN *LANGUAGES*
>>   and is in this particular case.
>
> Yes, but aren't the differences in the same ballpark,

(a) only to the degree that you are willing to call
all language differences "in the same ballpark".
(b) no, because the speed difference between the algorithms
grows without bound as the problem size increases, while
the speed difference between the languages does not.

Here's another example:  the UNIX 'tsort' command.
I have versions in Java, Smalltalk, and C.  The Java and
Smalltalk versions are about the same speed, linear in
the size of the graph.  The C version appears to be the
original UNIX code, and it is *quadratic* in the number
of nodes.  Result:  Smalltalk running faster than C by
a ratio that increases without bound as the input grows.
This is actually a case where it was easier to write fast
code than slow code in Smalltalk, easier to write slow code
than fast code in C.  (Built in hash tables, what else?)

> and if we want
> to compare *languages*, we should use identical algorithms to make the
> comparison fair.

In the permutation generation example, I was talking about
four programs:
           Language X  Language Y
Method 1   Program X1  Program Y1   -- identical algorithms
Method 2   Program X2  Program Y2   -- identical algorithms

However, "identical" isn't clearly defined.
For example, is a Java program that uses HashMap<String,Integer>
-- and thus allocates millions of Integer objects --
"identical" to a Smalltalk program using Dictionary
-- where the integers fit into the immediate range, so that
no integer boxes are needed at all -- ?
In the 'tsort' case, it turns out that the Java and Smalltalk
versions are I/O bound with over 90% of the time spent just
reading the data.  They have I/O libraries with very different
structures, so what does "identical algorithms" mean?  If you
are using dictionaries/hashmaps, and the two languages have
implementations that compute different hash functions for strings,
is _that_ using the same implementation?  (Some hash functions
are amazingly bad, see Andres Valloud's book.)

>> There is also a second claim, which I thought was uncontentious:
>> the relative difficulty of tasks varies with language.
>
> It matters much less if you write the code to be run multiple times.

You misunderstand.  The issue is whether you bother to write the
more efficient code *at all*.  The tsort code in C I was looking
at was real production code from a UNIX system that is still on
the market, and in the 20 or so years since it was written, nobody
had ever bothered to fix the blatant efficiency bug.  In other
languages, it would have been easier to write the program without
the bug in the first place.