[Haskell-cafe] Space leak
Arnoldo Muller
arnoldomuller at gmail.com
Wed Mar 10 17:01:28 EST 2010
Hello Daniel:
Thanks!
I employed mapM'_ but I am still getting the space leak.
Any other hint?
Arnoldo
On Wed, Mar 10, 2010 at 10:40 PM, Daniel Fischer
<daniel.is.fischer at web.de>wrote:
> Am Mittwoch 10 März 2010 21:45:56 schrieb Arnoldo Muller:
> > Hello,
> >
> > I am learning haskell and I found a space leak that I find difficult to
> > solve. I've been asking at #haskell but we could not solve
> > the issue.
> >
> > I want to lazily read a set of 22 files of about 200MB each, filter them
> > and then I want to output the result into a unique file.
> > If I modify the main function to work only with one input file, the
> > program runs without issues. I will call this version A.
> > Version B uses a mapM_ to iterate over a list of filenames and uses
> > appendFile to output the result of filtering each file.
> > In this case the memory usage grows sharply and quickly (profiles show
> > constant memory growth). In less than a minute, memory
> > occupation will make my system hang with swapping.
>
> No work is been done until the end, when all is tried to be done
> simultaneously. Make sure genomeExecute ... input1 has actually finished
> its work before genomeExecute ... input2 starts etc.
>
> One way is to use a stricter version of sequence_,
>
> sequence'_ :: Monad m => [m a] -> m ()
> sequence'_ (x:xs) = do
> a <- x
> a `seq` sequence'_ xs
> sequence'_ [] = return ()
>
> (nicer with BangPatterns, but not portable), and
>
> mapM'_ f = sequence'_ . map f
>
> Another option is making genomeExecute itself stricter.
>
> >
> > This is version B:
> >
> > ------------------------------- Program B
> > ------------------------------------------------------------------------
> >-------------------------------------------- import Data.List
> > import System.Environment
> > import System.Directory
> > import Control.Monad
> >
> >
> > -- different types of chromosomes
> > data Chromosome = C1
> >
> > | C2
> > | C3
> > | C4
> > | C5
> > | C6
> > | C7
> > | C8
> > | C9
> > | C10
> > | C11
> > | C12
> > | C13
> > | C14
> > | C15
> > | C16
> > | C17
> > | C18
> > | C19
> > | CX
> > | CY
> > | CMT
> >
> > deriving (Show)
> > -- define a window
> > type Sequence = [Char]
> > -- Window data
> > data Window = Window { sequen :: Sequence,
> > chrom :: Chromosome,
> > pos :: Int
> > }
> > -- print a window
> > instance Show Window where
> > show w = (sequen w) ++ "\t" ++ show (chrom w) ++ "\t" ++ show (pos
> > w)
> >
> > -- Reading fasta files with haskell
> >
> > -- Initialize the
> > main = do
> > -- get the arguments (intput is
> > [input, output, windowSize] <- getArgs
> > -- get directory contents (only names)
> > names <- getDirectoryContents input
> > -- prepend directory
> > let fullNames = filter isFastaFile $ map (\x -> input ++ "/" ++
> > x) names
> > let wSize = (read windowSize)::Int
> > -- process the directories
> > mapM (genomeExecute output wSize filterWindow) fullNames
> >
> >
> > -- read the files one by one and write them to the output file
> > genomeExecute :: String -> Int -> (Window -> Bool) -> String -> IO ()
> > genomeExecute outputFile windowSize f inputFile = do
> > fileData <- readFile inputFile
> > appendFile outputFile $ fastaExtractor fileData windowSize f
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100310/f0b585d8/attachment.html
More information about the Haskell-Cafe
mailing list