[Haskell-cafe] Can Haskell outperform C++?

Thu May 24 07:52:13 CEST 2012

On 24/05/2012, at 4:39 AM, Isaac Gouy wrote:

>> From: Richard O'Keefe <ok at cs.otago.ac.nz>
>> Sent: Tuesday, May 22, 2012 7:59 PM
> 
>> But string processing and text I/O using the java.io.* classes aren't brilliant.
> 
> Wait just a moment - Are you comparing text I/O for C programs that process bytes against Java programs that process double-byte unicode?

No.  Amongst other things, I have my own ByteString and ByteStringBuilder
classes that are basically clones of String and StringBuilder, and using
them makes surprisingly little direct difference; the point is saving
memory.

I have obtained large speedups in Java using Java by dodging around the
Java libraries.  Other people have reported the same to me.

>> With both of these changes, we are moving away from recommended good practice;
>> the faster code is the kind of code people are not supposed to write any more.
> 
> Says who? Is that on your own authority or some other source you can point us to?

It looks increasingly as though there is no point in this discussion.
Is there ANY conceivable criticism of Java that will not elicit
ad hominem attacks from you?

I have read more Java textbooks than I wished to.  I was on Sun's
Java techniques and tips mailing list for years.  I could go on,
but is there, *really*, any point?
> 
>> These particular measurements were made using my own Smalltalk compiler
>> which is an oddity amongst Smalltalks: a whole program compiler that compiles
>> via C.  Yes, most of the good ideas came from INRIA, although ST/X does
>> something not entirely dissimilar.
> 
> Wait just a moment - you wrote "I didn't _think_ I'd omitted anything important" and now it turns out that the measurements were made using your personal Smalltalk implementation!
> 
> You have got to be joking.

Why?  On various benchmarks, sometimes VisualWorks is better,
sometimes my system is better.  My system is utterly naive,
incorporating almost none of the classic Smalltalk optimisations.

I redid the test using VisualWorks NonCommercial.
It took about twice as long as my Smalltalk did.
According to 'TimeProfiler profile: [...]',
98% of the time is in the load phase; half of that
is down to the hash table.  A surprisingly small part
of the rest is due to actual input (ExternalReadStream>>next);
quite a bit goes into building strings and testing characters.

Why the difference?
With all due respect, VisualWorks still has the classic Smalltalk
implementation of hash tables.  Mine is different.  This is a
library issue, not a language issue.
One of the tasks in reading is skipping separators.
Since it's used a lot in parsing input, my library pushes that
right down to the bottom level of ReadStream and ChannelInputStream.
VisualWorks uses a single generic implementation that doesn't get
up to the low level dodges mine does.  And so on.

All *library* issues, not *compiler* or *language* issues.

Which is the whole point of this thread, as far as I am concerned.
C, Java, Smalltalk: this real example is dominated by *library*
level issues, not language issues or compiler issues.

>> And it's not INTERESTING, and it's not about LANGUAGES.
>> There is NOTHING about the Java language that makes code like this
>> necessarily slow.  It's the LIBRARY.  The java.io library was
>> designed for flexibility, not speed.  That's why there is a java.nio
>> library.  
> 
> Here's the gorilla in the room question - So why doesn't your program use java.nio?
> 
Because that would be insane.

This is a program I originally whipped up in less than an hour
for two reasons:

(A) I wanted to provide some students with an example of a
    "work-list" algorithm that had some realism to it.
    For that purpose, the program had to be READABLE.

(B) To my astonishment, the tsort(1) programs in OpenSolaris
    and Mac OS X 10.6.8 turned out to be grotesquely slow for
    non-toy graphs.  I was expecting to have a use for the
    program myself, so as it stood, the Java version was
    already quite fast enough to be useful.  (As in, a LOT
    faster than the system version, even though the system
    version was written in C.)

The one issue I had with the first version was not time, but
space, so I explored two ways of making it take less space.

There is no NEED to rewrite the program to use java.nio;
having replaced the system version of the command the Java
version was no longer the bottleneck in my intended use.

For me personally, having no experience with java.nio,
it was *easier* to rewrite the program from scratch in C
than to overcome the java.nio learning curve.  And in any
case, I knew very well that I could get near enough to the
same order of improvement using InputStream and wrapping
my own buffering code over that (I've done that before).
Above all, since the students were even less familiar with
nio than I am, using nio would have destroyed the program's
utility for purpose (A).

As for the Smalltalk version, I often rewrite small things
into Smalltalk in order to find out what I'm doing wrong in
my implementation.

> 
>> And that's the point I was making with this example.  Why does
>> Smalltalk come out in the middle of the Java results?  A balance
>> between a language penalty (tagged integer arithmetic is a lot
>> slower than native integer arithmetic) and a library bonus (a
>> leaner meaner I/O design where there are wrappers if you want
>> them but you very seldom need them).  It's the great advantage
>> of using libraries rather than syntax: libraries can be changed.
> 
> No, that doesn't seem to be the case - if I'm misunderstanding what you've done then please correct me, but it seems that Smalltalk comes out in the middle of the Java results because you chose to use a Java library "designed for flexibility, not speed" and you chose to use that library in a way that slows the program down.

No, I chose to
 - use the official Java plain text I/O library
 - the way the official Java series books and tutorials
   say it should be used
 - with a MINIMUM of wrapper layers.

And it was FAST ENOUGH TO BE USEFUL.
No, I chose to use that library THE WAY IT IS INTENDED TO BE USED.
It is the simplest most straightforward way to go.
It's the *same* "algorithm" that the C and Smalltalk versions use.

> imo It would be better to "show how much better programs using other data structures and algorithms perform those specific tasks" than brandish anecdotes from a past century.

"Past century"?  Insults, is it?

As for "how much better programs using other data structures and algorithms
perform", this whole thread is about how well programs using the SAME data
structures and algorithms perform, and whether we can assign much meaning
to that.  How could it possibly be better to do something irrelevant to the
topic?