[Haskell-cafe] Parallel weirdness

Bulat Ziganshin bulat.ziganshin at gmail.com
Sat Apr 19 12:04:48 EDT 2008


Hello Andrew,

Saturday, April 19, 2008, 7:50:30 PM, you wrote:

>> this looks like disk caching effects. if data are read from disj on
>> first run and from disk cache on the next runs, this only means that
>> your algorithm works faster than reading its data from disk
>>   

> Negative. No data is ever *read* from disk, only *written* to disk. (And
> each test writes to a different file.)

> The data to be sorted is generated using a trivial LCG PRNG.

if you don't generate new data for each sorting run, this means that
data generation is much slower than sorting. don't forget about ghc
laziness :)

>> there are plenty of reasons: first, -threaded make i/o overlapped
>> with calculations.

> Not with -N1.

are you sure? :)  afaik, -threaded RTS uses dedicated i/o thread
despite of -N setting (which controls only amount of threads running
*user* code)

>> second, parallel version may exhibit better cpu
>> cache behavior - such as processing all data in cache before sending
>> it back to memory
>>   

> Again, with -N1, it is *still* only using 1 CPU core.

parallel version is different from sequential one and it process data
in another order. for example, imagine tar+gzip algorithm which runs
sequentially and write intermediate results to the disk. the same
algorithm, being multithreaded, will compress data on the fly and
don't store intermediate data to the HDD despite using only one core.
the same effect applies to cpu cache usage


>>> Weird thing #4: Adding "-N2" makes *everything* slow down a few percent.
>>> In particular, Task Manager shows only one CPU core in use.
>>>     
>>
>> it's easy - your algorithm isn't really parallel.
>>   

> Fails to explain why the parallel version is faster than the sequential
> one (even with no parallelism), or why the sequential algorithm should
> slow down with more threads. (Surely the extra threads just sit idle?)

there are management overheads. with multiple worker threads you have
many OS threads which fights for the right to execute single Haskell
thread :))

>> there are many subtle effects making optimization much more interesting
>> than using simple schemas ;)  it's why i like it so much :))
>>   

> Well, based on the results I've seen so far, it seems that parallelism
> is a complete waste of time because it doesn't gain you anything. And 
> that doesn't make a lot of sense...

i made world fastest compression program using multithreading, so the
problem is definitely on other side ;)

-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com



More information about the Haskell-Cafe mailing list