[Haskell-cafe] monadic MapReduce
anishmuttreja at gmail.com
Sun Mar 1 13:58:17 EST 2009
On Sun, Mar 01, 2009 at 07:25:56PM +0100, Manlio Perillo wrote:
> I have a function that do some IO (take a file path, read the file,
> parse, and return some data), and I would like to parallelize it, so
> that multiple files can be parsed in parallel.
> I would like to use the simple mapReduce function,
> from Real Word Haskell:
> mapReduce :: Strategy b -- evaluation strategy for mapping
> -> (a -> b) -- map function
> -> Strategy c -- evaluation strategy for reduction
> -> ([b] -> c) -- reduce function
> -> [a] -- list to map over
> -> c
> mapReduce mapStrat mapFunc reduceStrat reduceFunc input =
> mapResult `pseq` reduceResult
> where mapResult = parMap mapStrat mapFunc input
> reduceResult = reduceFunc mapResult `using` reduceStrat
> Is this possible?
> Thanks Manlio Perillo
Would this work?
Read in each file into a string (or byteString) using a lazy function
and then call mapReduce with the strings instead of file paths.
import qualified Data.Bytestring.Lazy.Char8 as L
let handles = map (openFile ) files
strings <- mapM L.hGetContents handles
let result = mapReduce ...
The actual work of reading in the file should happen on-demand inside the
parsing function called by mapReduce.
I would like to know if this gives you the speedup you expect.
More information about the Haskell-Cafe