[Haskell-cafe] Measuring memory usage

Vlatko Basic vlatko.basic at gmail.com
Wed Jul 4 11:12:37 UTC 2018


Hello again,

I'm still trying to find some method to predict memory usage and came upon this 
page: https://wiki.haskell.org/GHC/Memory_Footprint

I tried to manually calculate how much memory will the record below consume 
(64-bit, 1W = 8B):

newtype Id   = Id    Text deriving (Generic, Data, NFData) -- = 6W
newtype Uid  = Uid   Int  deriving (Generic, Data, NFData) -- = 2W
newtype XUid = XUid  Uid  deriving (Generic, Data, NFData) -- = 2W
newtype YUid = YUid  Uid  deriving (Generic, Data, NFData) -- = 2W
data    Dir  = Yes | No   deriving (Generic, Data)         -- = 2W

data X = X
   { a :: XUid                -- =  2W       -- Int
   , b :: YUid                -- =  2W       -- Int
   , c :: Id                  -- =  6W + 8B  -- Text len 4
   , d :: Either Dir Dir      -- =  1W + 2W  -- Either + Dir + No/Yes
   , e :: Text                -- =  6W + 8B  -- Text len 4
   } deriving (Generic, Data) -- = 19W + 16B = 152 + 16 = 168B

and calculated the assumed sizes of few lists with different number of elements:

Expected list sizes    ([v] = (1 + 3N) words + N * sizeof(v))
   30: 1 + 3W *   30 + (19W *   30 + 16 *   30) =   5,761 B
  600: 1 + 3W *  600 + (19W *  600 + 16 *  600) = 115,201 B
5000: 1 + 3W * 5000 + (19W * 5000 + 16 * 5000) = 960,001 B

I also compared these sizes with three libs (Data.Generics.Schemes.gsize, 
GHC.DataSize.recursiveSizeNF, Weigh) and the results were:

#items recursiveSizeNF gSize      Weigh    Expected  Diff/recursiveSizeNF
    1:            1,416       18        696 168          -
   30:            8,008      931     20,880 5,761        28%
  600:          135,688   18,601    417,600 115,201        15%
5000:        1,121,288  155,001  3,480,000 960,001        14%

As you can see, the results are more than surprising (to me), with 
recursiveSizeNF coming closest. They all measure the same variable.


What am I missing?


For completeness, here are relevant parts of code for creating elements (with 
excessive forcing):

   let mkX i = force X{ a = XUid $ Uid i
                      , b = YUid $ Uid i
                      , c = Id $ tshow i
                      , d = if even i then (Left Yes) else (Right No)
                      , e = T.reverse (tshow i)
                      }
       xs30  = force . map mkX $ take   30 $ randomRs (1000,1030) (mkStdGen 0)
       xs600 = force . map mkX $ take  600 $ randomRs (1000,1600) (mkStdGen 0)
       xs5K  = force . map mkX $ take 5000 $ randomRs (1000,5000) (mkStdGen 0)

   dataSize <- recursiveSizeNF $!! {a}
   let gSize = gsize $!! mkX 0



> -------- Original Message --------
> Subject: Measuring memory usage
> From: Vlatko Basic <vlatko.basic at gmail.com>
> To: haskell-cafe <Haskell-cafe at Haskell.org>
> Date: 29/06/18 14:31
>
>
> Hello,
>
> I've come to some strange results using Weigh package.
>
> It shows that HashMap inside 'data' is using much, much more memory.
>
> The strange thing is that I'm seeing too large mem usage in my app as well 
> (several "MapData" like in records), and trying to figure out with 'weigh' 
> what's keeping the mem.
>
> Noticed that when I change the code to use HashMap directly (not inside 
> 'data', that's the only change), the mem usage observed with top drops down 
> for ~60M, from 850M to 790M.
>
>
> These are the test results for 10K, 5K and 3.3K items for "data MapData k v = 
> MapData (HashMap k v)" (at the end is the full runnable example.)
>
> Case           Allocated  GCs
> HashMap          262,824    0
> HashMap half      58,536    0
> HashMap third     17,064    0
> MapData        4,242,208    4
>
> I tested by changing the order, disabling all but one etc., and the results 
> were the same. Same 'weigh' behaviour with IntMap and Map.
>
>
> So, if anyone knows and has some experience with such issues, my questions are:
>
> 1. Is 'weigh' package reliable/usable, at least to some extent? (the results 
> do show diff between full, half and third)
>
> 2. How do you measure mem consumptions of your large data/records?
>
> 3. If the results are even approximately valid, what could cause such large 
> discrepancies with 'data'?
>
> 4. Is there a way to see if some record has been freed from memory, GCed?
>
>
>
> module Main where
>
> import Prelude
>
> import Control.DeepSeq     (NFData)
> import Data.HashMap.Strict (HashMap, fromList)
> import GHC.Generics        (Generic)
> import Weigh               (mainWith, value)
>
>
> data MapData k v = MapData (HashMap k v) deriving Generic
> instance (NFData k, NFData v) => NFData (MapData k v)
>
> full, half, third :: Int
> full  = 10000
> half  =  5000
> third =  3333
>
> main :: IO ()
> main = mainWith $ do
>   value "HashMap"       (          mkHMList full)
>   value "HashMap half"  (          mkHMList half)
>   value "HashMap third" (          mkHMList third)
>   value "MapData"       (MapData $ mkHMList full)
>
> mkHMList :: Int -> HashMap Int String
> mkHMList n = fromList . zip [1..n] $ replicate n "some text"
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20180704/8559d6ca/attachment.html>


More information about the Haskell-Cafe mailing list