[Haskell-cafe] operating on a hundred files at once

Jefferson Heard jeff at renci.org
Mon Apr 9 14:40:20 EDT 2007


Thanks for the advice.  I'm not so much interested in performance here,
as this is just a one-off.  Disk thrashing or not, these files are only
a few hundred K apiece, and I can't imagine that the whole computation
will take more than a few minutes.  

My question is more about how to deal with the IO monad "pollution" of
all the data in a situation where you have N instances of IO [a] at step
1, and you have M computations to perform on those instances, which are
all monad-free.

-- Jeff

On Mon, 2007-04-09 at 22:24 +0400, Bulat Ziganshin wrote:
> Hello Jefferson,
> 
> Monday, April 9, 2007, 9:34:12 PM, you wrote:
> 
> if you have enough memory available, the fastest way is to read file
> to memory using bytestring, convert it into array of doubles,
> repeating this step for all files. then perform your computations. if
> you will try to read 100 files simultaneously, this may lead to
> extensive disk seeking or cpu cache trashing
> 
> ... even better, you should read one file, add its values to the
> accumulators, then read next file...
> 
> 
> > I have a series of NxM numeric tables I'm doing a quick
> > mean/variance/t-test etcetera on.  The cell t1 [i,j] corresponds exactly
> > to the cells t2..N [i,j], and so it's perfectly possible to read one
> > item at a time from each of the 100 files and compute the mean/variance
> > etcetera on all cells that way.  So what I propose to do is something
> > along the lines of:
> 
> > openAndProcess filename = 
> > f <- readFile filename
> > return (map (L.split ',') . lines $ f)
> 
> > main = do 
> >         fs <- getArgs
> >         let items = map (map read) . map openAndProcess fs 
> >         in do print . map (map $ mean) items
> >               print . map (map $ variance) items
> 
> > How close am I to doing the right thing here? As I understand it, this
> > will result in one hundred IO [String] instances being returned by the
> > call to (map openAndProcess $ filenames).  Do I need to do something
> > special to lift (read), (mean), and (variance), or even (map) into the
> > IO monad so they can process the input as needed?
> 
> > Thanks in advance,
> > -- Jeff
> 
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe at haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> 
> 
> 



More information about the Haskell-Cafe mailing list