[Haskell-cafe] monadic MapReduce

Anish Muttreja anishmuttreja at gmail.com
Mon Mar 2 18:57:44 EST 2009

On Mon, Mar 02, 2009 at 04:10:41PM +0100, Manlio Perillo wrote:
> Anish Muttreja ha scritto:
>> On Sun, Mar 01, 2009 at 07:25:56PM +0100, Manlio Perillo wrote:
>>> Hi.
>>> I have a function that do some IO (take a file path, read the file,   
>>> parse, and return some data), and I would like to parallelize it, so  
>>> that multiple files can be parsed in parallel.
>>> I would like to use the simple mapReduce function,
>>> from Real Word Haskell:
>>> mapReduce :: Strategy b    -- evaluation strategy for mapping
>>>           -> (a -> b)      -- map function
>>>           -> Strategy c    -- evaluation strategy for reduction
>>>           -> ([b] -> c)    -- reduce function
>>>           -> [a]           -- list to map over
>>>           -> c

>>> mapReduce mapStrat mapFunc reduceStrat reduceFunc input =
>>>     mapResult `pseq` reduceResult
>>>   where mapResult    = parMap mapStrat mapFunc input
>>>         reduceResult = reduceFunc mapResult `using` reduceStrat
>>> Is this possible?
>>> Thanks  Manlio Perillo
>> Would this work?
> I suspect that it will not work..
>> Read in each file into a string (or byteString) using a lazy function
>> and then call mapReduce with the strings instead of file paths. 
>> import qualified Data.Bytestring.Lazy.Char8 as L
>> do
>>     let handles =  map (openFile ) files
>>     strings <- mapM L.hGetContents handles
>>     let result = mapReduce ... 
>> The actual work of reading in the file should happen on-demand inside 
>> the parsing function called by mapReduce.
> By doing this I will probably lose any control about file resources usage.


How about this. Is there a reason why I can't 
replace the variables b and c in the type signature of mapReduce with with (IO b') 
and (IO c'). b and c  can be any types. 

mapReduce :: Strategy (IO b')    -- evaluation strategy for mapping
           -> (a -> IO b')      -- map function
           -> Strategy (IO c')    -- evaluation strategy for reduction
           -> ([IO b'] -> (IO c'))    -- reduce function
           -> [a]           -- list to map over
           -> (IO c')

Just remember to wrap all values back in the IO monad.


> Thanks  Manlio

More information about the Haskell-Cafe mailing list