[Haskell-cafe] performance of map reduce
Manlio Perillo
manlio_perillo at libero.it
Fri Sep 19 17:31:46 EDT 2008
Don Stewart ha scritto:
> manlio_perillo:
> [...]
>> It is possible to implement a map reduce version that can handle gzipped
>> log files?
>
> Using the zlib binding on hackage.haskell.org, you can stream multiple
> zlib decompression threads with lazy bytestrings, and combine the
> results.
>
This is a bit hard.
A deflate encoded stream contains multiple blocks, so you need to find
the offset of each block and decompress it in parallel.
But then you need also to make sure each final block terminates with a '\n'.
And the zlib Haskell binding does not support this usage (I'm not even
sure zlib support this).
By the way, this phrase:
"We allow multiple threads to read different chunks at once by supplying
each one with a distinct file handle, all reading the same file"
here:
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193
IMHO is not correct, or at least misleading.
Each block is read in the main thread, or at least myThreadId return
always the same value.
This is also the reason why I don't understand why my version is slower
then the book version.
The only difference is that the book version reads 4 chunks and my
version only 1 big chunk.
> -- Don
>
Thanks Manlio
More information about the Haskell-Cafe
mailing list