[Haskell-cafe] monadic MapReduce

Sun Mar 1 13:58:17 EST 2009

On Sun, Mar 01, 2009 at 07:25:56PM +0100, Manlio Perillo wrote:
> Hi.
>
> I have a function that do some IO (take a file path, read the file,  
> parse, and return some data), and I would like to parallelize it, so  
> that multiple files can be parsed in parallel.
>
> I would like to use the simple mapReduce function,
> from Real Word Haskell:
>
> mapReduce :: Strategy b    -- evaluation strategy for mapping
>           -> (a -> b)      -- map function
>           -> Strategy c    -- evaluation strategy for reduction
>           -> ([b] -> c)    -- reduce function
>           -> [a]           -- list to map over
>           -> c
>
> mapReduce mapStrat mapFunc reduceStrat reduceFunc input =
>     mapResult `pseq` reduceResult
>   where mapResult    = parMap mapStrat mapFunc input
>         reduceResult = reduceFunc mapResult `using` reduceStrat
>
> Is this possible?
>
>
> Thanks  Manlio Perillo

Would this work?
Read in each file into a string (or byteString) using a lazy function
and then call mapReduce with the strings instead of file paths. 

import qualified Data.Bytestring.Lazy.Char8 as L
do
    let handles =  map (openFile ) files
    strings <- mapM L.hGetContents handles
    let result = mapReduce ... 

The actual work of reading in the file should happen on-demand inside the 
parsing function called by mapReduce.

I would like to know if this gives you the speedup you expect.

Regards,
Anish