[Haskell-cafe] Can Haskell outperform C++?

Tue May 22 18:54:52 CEST 2012

> From: Richard O'Keefe
> Sent: Monday, May 21, 2012 6:54 PM

> On 22/05/2012, at 4:15 AM, Isaac Gouy wrote:
>>>  Actually, I/O bound is *good*.
>> 
>>  Why would that be good or bad?
> 
> The context here is a UNIX-style topological sorting program.
> Being I/O bound means that the program is limited by how fast
> it can read the data.  If 90% of the time goes into reading
> the data, that means that the *algorithmic* part is fast enough.
> 
> There may be very little one can do about the I/O part.

Maybe you could say how the Java I/O is being done.

> I didn't _think_ I'd omitted anything important.  Oh well.

-snip-
>     For 50,000 nodes and 8,385,254 edges,
>     Java (first version) ran out of memory after 89.54 seconds (default heap)
>     Java (2nd version)  13.31 seconds   -- avoid Integer boxing!
>         Java (3rd version)  15.07 seconds
>         Smalltalk           14.52 seconds
>         C                    2.36 seconds

fwiw my expectation is that Java would not be that much slower than C, and would be considerably faster than Smalltalk.

fwiw my expectation is that should be possible to make the Java program considerably faster. Of course, what I expect and what happens are often very different ;-)

>>>>  Of course, to some degree, user defined hash functions remedy that 
> specific problem.
>>> 
>>>  While creating other, and perhaps worse, ones.
>>> 
>>>  For example, in the Smalltalk code, if you use a Dictionary of Strings,
>>>  you're getting Robert Jenkin's hash function in optimised C.  
> If you
>>>  supply your own, you're getting a very probably worse hash function
>>>  and it's going to run rather slower.  And above all, the stuff you 
> are
>>>  benchmarking is no longer code that people are actually likely to 
> write.
>> 
>>  I think you're being a touch quarrelsome :-)
> 
> That upset me. 

I'm sorry that gentle comment upset you.

> All I was saying is that surely the only *point* of
> talking about the performance of *languages* is to get some idea of
> how well programs are likely to do, and not any old specially crafted
> code, but the kind of code that is likely to be written for purposes
> other than benchmarking.  So the more you bash on a particular example
> to get the time down, the less representative it is of the kind of code
> people are likely to write *for purposes other than benchmarking*.

Certainly less representative of the kind of code students are likely to write for course credits, and the kind of code people are likely to write to complete some project task before they hand it off to someone else, and the kind of code people are likely to write until their program is actually put into use and suddenly they are working the weekend to make their program faster.

A more positive way to think of - code written for purposes of benchmarking - is that it's like code written to address a performance hot spot. 

> Just today I marked a student program where their program took 15 minutes
> and my model answer took a touch under 4 milliseconds.  I explained to
> them _that_ their program was spectacularly slow.  They already knew _why_
> it was.  I also explained the one trick (lazy initialisation) that could
> have hugely improved the time.  I also told them that they had made the
> right decision, to optimise *their* time, not the computer's, in their
> particular context.

The whole point is carried by your final assertion.

Here's another anecdote - in my first week on site, I overheard a group of engineers arguing that Smalltalk should be abandoned because calculation times were incredibly slow. I peeked at the code, saw that it was littered with unnecessary numeric conversions (plus unnecessary arbitrary precision arithmetic), fixed and timed a sample, and left them with a lot of rework to do all across their library code. 

The "kind of code people are likely to write" is sometimes best described as bad.

That can have consequences which spiral out of proportion -- an engineer writes some bad Smalltalk, an app performs so slowly the business group loses money, the manager of the business group notices and is shown that the slow app is the problem (and it really is the problem), the manager now "knows" that "Smalltalk is slow", the manager moves the business group away from Smalltalk, the manager is promoted and moves the whole organization away from Smalltalk. That's also an anecdote.

> *That* function, no.  *Some* hash function as a "primitive" (meaning
> optimised C), well, I don't have every Smalltalk, but the ones I do
> have, I've checked, and yes they do.

Maybe I misunderstood the thrust of your original comment - Were you trying to make a point about writing in C and calling that code from Smalltalk as a user written primitive; or were you trying to make a point about the importance of using a better hash function?

>>  Have you researched what code people are actually likely to write, or are 
> you just speculating? ;-)
> 
> I'm in my 6th decade.  I've seen a lot of code in a lot of languages.

So just speculating.

> The subject line has the obvious and boring answer "yes, of course".

I agree, because of the implicit qualification - "if we write the C++ badly enough" :-)

-snip-
> The point here of course is that there are data structures and algorithms
> that are easy to express in some languages but hard in others, so that
> in code written *for purposes other than benchmarking*, they just would
> not be _used_ in one of those languages.

Let's say that's true - until we show how much better programs using other data structures and algorithms perform those specific tasks, it's just an excuse. 

(And, depending on how hard it is to express those different data structures and algorithms in other languages, it might not even be a good excuse.)