[Haskell-cafe] Allocating enormous amounts of memory and
jeff at renci.org
Tue Jul 10 22:13:26 EDT 2007
I switched to Data.Binary, which dropped me from 2.6GB to 1.5GB, and
then I switched this afternoon to unboxed arrays from lists of floats,
and that dropped me again from 1.5GB to 475MB. I think, all told, that
I'm in an acceptable range now, and thank you for pointing out the
library mistake. I'm also down from 1.5 minutes load time to under 10
seconds of load time, which is also very very nice. Incidentally, the
code I'm now using is:
String -> IO (Ptr Float, Array.UArray Int Int)
binaryLoadDocumentCoordinates path = do
putStrLn "File opened"
coordinates <- decodeFile (path ++ "/Clusters.bin") :: IO
(Array.UArray Int Float)
print . Array.bounds $ coordinates
putStrLn "Got coordinates"
galaxies <- decodeFile (path ++ "/Galaxies.bin") :: IO (Array.UArray
putStrLn "Got galaxies"
coordinatesArr <- mallocArray . snd . Array.bounds $ coordinates
putStrLn "Allocated array"
pokeArray coordinatesArr . Array.elems $ coordinates
return (coordinatesArr, galaxies)
binarySaveDocumentCoordinates :: String -> [Point] -> IO ()
binarySaveDocumentCoordinates path points = do
let len = length points
encodeFile (path ++ "Clusters.bin") . (Array.listArray (0,len*3) ::
[Float] -> Array.UArray Int Float) . coordinateList . solve $ points
encodeFile (path ++ "Galaxies.bin") . (Array.listArray (0,len) ::
[Int] -> Array.UArray Int Int) . galaxyList $ points
On Sun, 2007-07-08 at 14:37 -0700, Stefan O'Rear wrote:
> On Sun, Jul 08, 2007 at 05:26:18PM -0400, Jefferson Heard wrote:
> > I'm using the Data.AltBinary package to read in a list of 4.8 million
> > floats and 1.6 million ints. Doing so caused the memory footprint to
> > blow up to more than 2gb, which on my laptop simply causes the program
> > to crash. I can do it on my workstation, but I'd really rather not,
> > because I want my program to be fairly portable.
> > The file that I wrote out in packing the data structure was only 28MB,
> > so I assume I'm just using the wrong data structure, or I'm using full
> > laziness somewhere I shouldn't be.
> > I've tried compiling with profiling enabled, but I wasn't able to,
> > because the Streams package doesn't seem to have an option for compiling
> > with profiling. I'm also a newbie to Cabal, so I'm probably just
> > missing something.
> > The fundamental question, though is "Is there something wrong with how I
> > wrote the following function?"
> > binaryLoadDocumentCoordinates :: String -> IO (Ptr CFloat, [Int])
> > binaryLoadDocumentCoordinates path = do
> > pointsH <- openBinaryFile (path ++ "/Clusters.bin") ReadMode
> > coordinates <- get pointsH :: IO [Float]
> > galaxies <- get pointsH :: IO [Int]
> > coordinatesArr <- mallocArray (length coordinates)
> > pokeArray coordinatesArr (map (fromRational . toRational) coordinates)
> > return (coordinatesArr, galaxies)
> > I suppose in a pinch I could write a C function that serializes the
> > data, but I'd really rather not. What I'm trying to do is load a bunch
> > of coordinates into a vertex array for OpenGL. I did this for a small
> > 30,000 item vertex array, but I need to be able to handle several
> > million vertices in the end.
> > If I serialize an unboxed array instead of a list or if I do repeated
> > "put_" and "get" calls, will that help with the memory problem?
> Why are you using AltBinary instead of the (much newer and faster)
> Binary? Binary *does* work with profiling and does not depend on
> (To compile Binary with profiling support, add -p to the Cabal
> configuration line. This is documented in the --help message!)
> Yes, using unboxed arrays will help. Also try using the -c RTS option
> (that is, run your program as ./myprogram +RTS -c -RTS) - this tells the
> garbage collector to use a mark-compact system, which is slower than the
> default copying collector but uses roughly half as much memory.
More information about the Haskell-Cafe