[Haskell-cafe] monadic MapReduce

Mon Mar 2 10:10:41 EST 2009

Anish Muttreja ha scritto:
> On Sun, Mar 01, 2009 at 07:25:56PM +0100, Manlio Perillo wrote:
>> Hi.
>>
>> I have a function that do some IO (take a file path, read the file,  
>> parse, and return some data), and I would like to parallelize it, so  
>> that multiple files can be parsed in parallel.
>>
>> I would like to use the simple mapReduce function,
>> from Real Word Haskell:
>>
>> mapReduce :: Strategy b    -- evaluation strategy for mapping
>>           -> (a -> b)      -- map function
>>           -> Strategy c    -- evaluation strategy for reduction
>>           -> ([b] -> c)    -- reduce function
>>           -> [a]           -- list to map over
>>           -> c
>>
>> mapReduce mapStrat mapFunc reduceStrat reduceFunc input =
>>     mapResult `pseq` reduceResult
>>   where mapResult    = parMap mapStrat mapFunc input
>>         reduceResult = reduceFunc mapResult `using` reduceStrat
>>
>> Is this possible?
>>
>>
>> Thanks  Manlio Perillo
> 
> Would this work?

I suspect that it will not work..

> Read in each file into a string (or byteString) using a lazy function
> and then call mapReduce with the strings instead of file paths. 
> 
> import qualified Data.Bytestring.Lazy.Char8 as L
> do
>     let handles =  map (openFile ) files
>     strings <- mapM L.hGetContents handles
>     let result = mapReduce ... 
> 
> The actual work of reading in the file should happen on-demand inside the 
> parsing function called by mapReduce.
> 

By doing this I will probably lose any control about file resources usage.

Thanks  Manlio