[Haskell-beginners] Space leak while reading from a file?

David McBride toad3k at gmail.com
Mon May 12 16:33:20 UTC 2014


This is a bit advanced for the beginners list.  You would probably have
better luck on stackoverflow.


On Sun, May 11, 2014 at 7:24 AM, Jan Snajder <jan.snajder at fer.hr> wrote:

> Dear all,
>
> I'm trying to implement a simple file-based database. I apparently have
> a space leak, but I have no clue where it comes from.
>
> Here's the file-based database implementation:
> http://pastebin.com/QqiqcXFw
>
> The idea to have a database table in a single textual file. One line
> equals one table row. The fields within a row are whitespace separated.
> The first field is the key. Because I'd like to work with large files, I
> don't want to load the whole file into memory. Instead, I'd like to be
> able to fetch the rows on demand, by keys. Thus I first create an index
> that links keys to file seeks. I use the readerT to add the index to the
> IO monad.
>
> For testing, I use a dummy table produced as follows:
>
> import System.IO
> import Text.Printf
> import Control.Monad
>
> row = unwords [printf "field%03d" (i::Int) | i <- [1..999]]
>
> main = do
>   forM_ [1..250000] $ \i ->
>      putStrLn $ printf "row%06d %s" (i::Int) row
>
> This generates a 2.1G textual file, which I store on my disk.
>
> The testing code:
>
> import FileDB
> import qualified Data.Text as T
> import Text.Printf
> import Control.Applicative
> import Control.Monad
> import Control.Monad.Trans
> import System.IO
> import System.Environment
>
> main = do
>   (f:_) <- getArgs
>   t <- openTable f
>   runDB t $ do
>     ks <- getKeys
>     liftIO $ do
>       putStrLn . printf "%d keys read" $ length ks
>       putStrLn "Press any key to continue..."
>       getChar
>     forM_ ks $ \k -> do
>       Just r <- getRow k
>       liftIO . putStrLn $ printf "Row \"%s\" has %d fields"
>         (T.unpack k) (length r)
>
> When I run the test on the 2.1GB file, the whole program consumes 10GB.
>
> 6GB seem to be allocated after the index is built (just before entering
> the forM_ function). The remaining 4GB are allocated while fetching all
> the rows.
>
> I find both things difficult to explain.
>
> 6GB seems too much for the index. Each key is 9 characters (stored as
> Data.Text), and I have 250K such keys in a Data.Map. Should this really
> add up to 6GB?
>
> Also, I have no idea why fetching all the rows, one by one, should
> consume any additional memory. Each row is fetched and its length is
> computed and printed out. I see no reason for the rows to be retained in
> the memory.
>
> Here's the memory allocation summary:
>
> > 1,093,931,338,632 bytes allocated in the heap
> >    2,225,144,704 bytes copied during GC
> >    4,533,898,000 bytes maximum residency (26 sample(s))
> >    3,080,926,336 bytes maximum slop
> >            10004 MB total memory in use (0 MB lost due to fragmentation)
> >
> >                                     Tot time (elapsed)  Avg pause  Max
> pause
> >   Gen  0     2171739 colls,     0 par   45.29s   45.26s     0.0000s
>  0.0030s
> >   Gen  1        26 colls,     0 par    1.50s    1.53s     0.0589s
> 0.7087s
> >
> >   INIT    time    0.00s  (  0.00s elapsed)
> >   MUT     time  279.92s  (284.85s elapsed)
> >   GC      time   46.80s  ( 46.79s elapsed)
> >   EXIT    time    0.68s  (  0.71s elapsed)
> >   Total   time  327.40s  (332.35s elapsed)
> >
> >   %GC     time      14.3%  (14.1% elapsed)
> >
> >   Alloc rate    3,908,073,170 bytes per MUT second
> >
> >   Productivity  85.7% of total user, 84.4% of total elapsed
>
>
> Btw., I don't get the "bytes allocated in the heap" figure, which is
> approx. 1000 GB (?).
>
> I'm obviously doing something wrong here. I'd be thankful for any help.
>
> Best,
> Jan
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20140512/9c165224/attachment.html>


More information about the Beginners mailing list