[Haskell-cafe] performance of map reduce

Manlio Perillo manlio_perillo at libero.it
Fri Sep 19 17:31:46 EDT 2008


Don Stewart ha scritto:
> manlio_perillo:
> [...]
>> It is possible to implement a map reduce version that can handle gzipped 
>> log files?
> 
> Using the zlib binding on hackage.haskell.org, you can stream multiple
> zlib decompression threads with lazy bytestrings, and combine the
> results.
> 

This is a bit hard.
A deflate encoded stream contains multiple blocks, so you need to find 
the offset of each block and decompress it in parallel.
But then you need also to make sure each final block terminates with a '\n'.

And the zlib Haskell binding does not support this usage (I'm not even 
sure zlib support this).



By the way, this phrase:
"We allow multiple threads to read different chunks at once by supplying 
each one with a distinct file handle, all reading the same file"
here:
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193

IMHO is not correct, or at least misleading.
Each block is read in the main thread, or at least myThreadId return 
always the same value.

This is also the reason why I don't understand why my version is slower 
then the book version.
The only difference is that the book version reads 4 chunks and my 
version only 1 big chunk.


> -- Don
> 


Thanks   Manlio


More information about the Haskell-Cafe mailing list