[Haskell-cafe] performance of map reduce

Don Stewart dons at galois.com
Fri Sep 19 13:12:43 EDT 2008


manlio_perillo:
> Hi again.
> 
> In 
> http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id676390
> there is a map reduce based log parser.
> 
> I have written an alternative version:
> http://paste.pocoo.org/show/85699/
> 
> but, with a file of 315 MB, I have [1]:
> 
> 1) map reduce implementation, non parallel
> real	0m6.643s
> user	0m6.252s
> sys	0m0.212s
> 
> 2) map reduce implementation, parallel with 2 cores
> real	0m3.840s
> user	0m6.384s
> sys	0m0.652s
> 
> 3) my implementation
> real	0m8.121s
> user	0m7.804s
> sys	0m0.216s
> 
> 
> 
> What is the reason of the map reduce implementation being faster, even 
> if not parallelized?

Changes in how GC is utilised, or how optimisation works?
  
> It is possible to implement a map reduce version that can handle gzipped 
> log files?

Using the zlib binding on hackage.haskell.org, you can stream multiple
zlib decompression threads with lazy bytestrings, and combine the
results.

-- Don


More information about the Haskell-Cafe mailing list