[Haskell-cafe] Parallel weirdness

Sat Apr 19 10:56:10 EDT 2008

OK, so just for fun, I decided to try implementing a parallel merge sort 
using the seq and par combinators. My plan was to generate some 
psuedo-random data and time how long it takes to sort it. To try to 
account for lazy evaluation, what the program actually does is this:

1. Write the input data to disk without any sorting. (This ought to 
force it to be fully evaluated.)
2. Sort and save the data to disk 8 times. (So I can average the runtimes.)

This is done with two data sets - one with 1 million items, and another 
with 2 million rows. Each data set is run through both the purely 
sequential algorithm and a simple parallel one. (Split the list in half, 
merge-sort each half in parallel, and then merge them.)

The results of this little benchmark utterly defy comprehension. Allow 
me to enumerate:

Weird thing #1: The first time you sort the data, it takes a few 
seconds. The other 7 times, it takes a split second - roughly 100x 
faster. Wuh?

Weird thing #2: The parallel version runs *faster* than the sequential 
one in all cases - even with SMP disabled! (We're only talking a few 
percent faster, but still.)

Weird thing #3: Adding the "-threaded" compiler option makes 
*everything* run a few percent faster. Even with only 1 OS thread.

Weird thing #4: Adding "-N2" makes *everything* slow down a few percent. 
In particular, Task Manager shows only one CPU core in use.

Adding more than 2 OS threads makes everything slow down even further - 
but that's hardly surprising.

Can anybody explain any of this behaviour? I have no idea what I'm 
benchmarking, but it certainly doesn't appear to be the performance of a 
parallel merge sort!