[Haskell-cafe] Allocating enormous amounts of memory and
wondering why
Stefan O'Rear
stefanor at cox.net
Sun Jul 8 17:37:02 EDT 2007
On Sun, Jul 08, 2007 at 05:26:18PM -0400, Jefferson Heard wrote:
> I'm using the Data.AltBinary package to read in a list of 4.8 million
> floats and 1.6 million ints. Doing so caused the memory footprint to
> blow up to more than 2gb, which on my laptop simply causes the program
> to crash. I can do it on my workstation, but I'd really rather not,
> because I want my program to be fairly portable.
>
> The file that I wrote out in packing the data structure was only 28MB,
> so I assume I'm just using the wrong data structure, or I'm using full
> laziness somewhere I shouldn't be.
>
> I've tried compiling with profiling enabled, but I wasn't able to,
> because the Streams package doesn't seem to have an option for compiling
> with profiling. I'm also a newbie to Cabal, so I'm probably just
> missing something.
>
> The fundamental question, though is "Is there something wrong with how I
> wrote the following function?"
>
> binaryLoadDocumentCoordinates :: String -> IO (Ptr CFloat, [Int])
> binaryLoadDocumentCoordinates path = do
> pointsH <- openBinaryFile (path ++ "/Clusters.bin") ReadMode
> coordinates <- get pointsH :: IO [Float]
> galaxies <- get pointsH :: IO [Int]
> coordinatesArr <- mallocArray (length coordinates)
> pokeArray coordinatesArr (map (fromRational . toRational) coordinates)
> return (coordinatesArr, galaxies)
>
> I suppose in a pinch I could write a C function that serializes the
> data, but I'd really rather not. What I'm trying to do is load a bunch
> of coordinates into a vertex array for OpenGL. I did this for a small
> 30,000 item vertex array, but I need to be able to handle several
> million vertices in the end.
>
> If I serialize an unboxed array instead of a list or if I do repeated
> "put_" and "get" calls, will that help with the memory problem?
Why are you using AltBinary instead of the (much newer and faster)
Binary? Binary *does* work with profiling and does not depend on
streams.
(To compile Binary with profiling support, add -p to the Cabal
configuration line. This is documented in the --help message!)
Yes, using unboxed arrays will help. Also try using the -c RTS option
(that is, run your program as ./myprogram +RTS -c -RTS) - this tells the
garbage collector to use a mark-compact system, which is slower than the
default copying collector but uses roughly half as much memory.
Stefan
More information about the Haskell-Cafe
mailing list