[Haskell-beginners] Space leak while reading from a file?
David McBride
toad3k at gmail.com
Mon May 12 16:33:20 UTC 2014
This is a bit advanced for the beginners list. You would probably have
better luck on stackoverflow.
On Sun, May 11, 2014 at 7:24 AM, Jan Snajder <jan.snajder at fer.hr> wrote:
> Dear all,
>
> I'm trying to implement a simple file-based database. I apparently have
> a space leak, but I have no clue where it comes from.
>
> Here's the file-based database implementation:
> http://pastebin.com/QqiqcXFw
>
> The idea to have a database table in a single textual file. One line
> equals one table row. The fields within a row are whitespace separated.
> The first field is the key. Because I'd like to work with large files, I
> don't want to load the whole file into memory. Instead, I'd like to be
> able to fetch the rows on demand, by keys. Thus I first create an index
> that links keys to file seeks. I use the readerT to add the index to the
> IO monad.
>
> For testing, I use a dummy table produced as follows:
>
> import System.IO
> import Text.Printf
> import Control.Monad
>
> row = unwords [printf "field%03d" (i::Int) | i <- [1..999]]
>
> main = do
> forM_ [1..250000] $ \i ->
> putStrLn $ printf "row%06d %s" (i::Int) row
>
> This generates a 2.1G textual file, which I store on my disk.
>
> The testing code:
>
> import FileDB
> import qualified Data.Text as T
> import Text.Printf
> import Control.Applicative
> import Control.Monad
> import Control.Monad.Trans
> import System.IO
> import System.Environment
>
> main = do
> (f:_) <- getArgs
> t <- openTable f
> runDB t $ do
> ks <- getKeys
> liftIO $ do
> putStrLn . printf "%d keys read" $ length ks
> putStrLn "Press any key to continue..."
> getChar
> forM_ ks $ \k -> do
> Just r <- getRow k
> liftIO . putStrLn $ printf "Row \"%s\" has %d fields"
> (T.unpack k) (length r)
>
> When I run the test on the 2.1GB file, the whole program consumes 10GB.
>
> 6GB seem to be allocated after the index is built (just before entering
> the forM_ function). The remaining 4GB are allocated while fetching all
> the rows.
>
> I find both things difficult to explain.
>
> 6GB seems too much for the index. Each key is 9 characters (stored as
> Data.Text), and I have 250K such keys in a Data.Map. Should this really
> add up to 6GB?
>
> Also, I have no idea why fetching all the rows, one by one, should
> consume any additional memory. Each row is fetched and its length is
> computed and printed out. I see no reason for the rows to be retained in
> the memory.
>
> Here's the memory allocation summary:
>
> > 1,093,931,338,632 bytes allocated in the heap
> > 2,225,144,704 bytes copied during GC
> > 4,533,898,000 bytes maximum residency (26 sample(s))
> > 3,080,926,336 bytes maximum slop
> > 10004 MB total memory in use (0 MB lost due to fragmentation)
> >
> > Tot time (elapsed) Avg pause Max
> pause
> > Gen 0 2171739 colls, 0 par 45.29s 45.26s 0.0000s
> 0.0030s
> > Gen 1 26 colls, 0 par 1.50s 1.53s 0.0589s
> 0.7087s
> >
> > INIT time 0.00s ( 0.00s elapsed)
> > MUT time 279.92s (284.85s elapsed)
> > GC time 46.80s ( 46.79s elapsed)
> > EXIT time 0.68s ( 0.71s elapsed)
> > Total time 327.40s (332.35s elapsed)
> >
> > %GC time 14.3% (14.1% elapsed)
> >
> > Alloc rate 3,908,073,170 bytes per MUT second
> >
> > Productivity 85.7% of total user, 84.4% of total elapsed
>
>
> Btw., I don't get the "bytes allocated in the heap" figure, which is
> approx. 1000 GB (?).
>
> I'm obviously doing something wrong here. I'd be thankful for any help.
>
> Best,
> Jan
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20140512/9c165224/attachment.html>
More information about the Beginners
mailing list