[Haskell-cafe] Measuring memory usage
Li-yao Xia
lysxia at gmail.com
Wed Jul 4 13:12:13 UTC 2018
Hi Vlatko,
I don't get the same numbers with ghc-datasize
recursiveSize $! xs1 -- 240 (list of length 1)
recursiveSize $! xs30 -- 6296
compiled on GHC 8.0.2 with optimizations (-O)
whereas recursiveSizeNF gives me the size of the thunk (force xs1),
which depends on whether xs1 is evaluated or not.
The size of X is missing:
- 1 word for the X constructor
- 5 words for the fields (they are not unpacked, this must be explicitly
required with the {-# UNPACK #-} pragma)
- 8B for one of the text fields (which for some reason takes 64B instead
of 56B)
That's 56B extra, for a total of 224B. Add in 4W for a singleton list
and that's 256B. Why are 16B missing in the above 240B figure? Note that
in mkX, there are two fields equal to (Uid i), so they get shared
(recursiveSize doesn't recount shared structures). (It's hardly an
optimization: if we take out the newtypes, the code looks like X {a = i,
b = i, ...}, so the compiler simply puts the same pointer in the two
fields.)
Notice also that nullary constructors (in particular Yes/No) will always
be shared. The compiler may also float out the whole "Left Yes" and
"Right No" to the toplevel, further reducing the size of longer lists.
gsize counts constructors. In particular, each newtype constructor
counts as one, and primitives like Int and Char also count as one (the
value they box is not visible to Data), and Text has a dummy Data
instance to make it seem like a newtype around [Char].
You can use this snippet to see a trace of the generic traversal:
everywhereM (\x -> print (dataTypeOf x) >> return x) xs1
Finally, "weigh" counts allocations, which are generally a superset of
the actual space taken by a data structure once it is fully computed.
Li-yao
On 07/04/2018 07:12 AM, Vlatko Basic wrote:
> Hello again,
>
> I'm still trying to find some method to predict memory usage and came
> upon this page: https://wiki.haskell.org/GHC/Memory_Footprint
>
> I tried to manually calculate how much memory will the record below
> consume (64-bit, 1W = 8B):
>
> newtype Id = Id Text deriving (Generic, Data, NFData) -- = 6W
> newtype Uid = Uid Int deriving (Generic, Data, NFData) -- = 2W
> newtype XUid = XUid Uid deriving (Generic, Data, NFData) -- = 2W
> newtype YUid = YUid Uid deriving (Generic, Data, NFData) -- = 2W
> data Dir = Yes | No deriving (Generic, Data) -- = 2W
>
> data X = X
> { a :: XUid -- = 2W -- Int
> , b :: YUid -- = 2W -- Int
> , c :: Id -- = 6W + 8B -- Text len 4
> , d :: Either Dir Dir -- = 1W + 2W -- Either + Dir + No/Yes
> , e :: Text -- = 6W + 8B -- Text len 4
> } deriving (Generic, Data) -- = 19W + 16B = 152 + 16 = 168B
>
> and calculated the assumed sizes of few lists with different number of
> elements:
>
> Expected list sizes ([v] = (1 + 3N) words + N * sizeof(v))
> 30: 1 + 3W * 30 + (19W * 30 + 16 * 30) = 5,761 B
> 600: 1 + 3W * 600 + (19W * 600 + 16 * 600) = 115,201 B
> 5000: 1 + 3W * 5000 + (19W * 5000 + 16 * 5000) = 960,001 B
>
> I also compared these sizes with three libs
> (Data.Generics.Schemes.gsize, GHC.DataSize.recursiveSizeNF, Weigh) and
> the results were:
>
> #items recursiveSizeNF gSize Weigh Expected Diff/recursiveSizeNF
> 1: 1,416 18 696 168 -
> 30: 8,008 931 20,880 5,761 28%
> 600: 135,688 18,601 417,600 115,201 15%
> 5000: 1,121,288 155,001 3,480,000 960,001 14%
>
> As you can see, the results are more than surprising (to me), with
> recursiveSizeNF coming closest. They all measure the same variable.
>
>
> What am I missing?
>
>
> For completeness, here are relevant parts of code for creating elements
> (with excessive forcing):
>
> let mkX i = force X{ a = XUid $ Uid i
> , b = YUid $ Uid i
> , c = Id $ tshow i
> , d = if even i then (Left Yes) else (Right No)
> , e = T.reverse (tshow i)
> }
> xs30 = force . map mkX $ take 30 $ randomRs (1000,1030)
> (mkStdGen 0)
> xs600 = force . map mkX $ take 600 $ randomRs (1000,1600)
> (mkStdGen 0)
> xs5K = force . map mkX $ take 5000 $ randomRs (1000,5000)
> (mkStdGen 0)
>
> dataSize <- recursiveSizeNF $!! {a}
> let gSize = gsize $!! mkX 0
>
>
>
>> -------- Original Message --------
>> Subject: Measuring memory usage
>> From: Vlatko Basic <vlatko.basic at gmail.com>
>> To: haskell-cafe <Haskell-cafe at Haskell.org>
>> Date: 29/06/18 14:31
>>
>>
>> Hello,
>>
>> I've come to some strange results using Weigh package.
>>
>> It shows that HashMap inside 'data' is using much, much more memory.
>>
>> The strange thing is that I'm seeing too large mem usage in my app as
>> well (several "MapData" like in records), and trying to figure out
>> with 'weigh' what's keeping the mem.
>>
>> Noticed that when I change the code to use HashMap directly (not
>> inside 'data', that's the only change), the mem usage observed with
>> top drops down for ~60M, from 850M to 790M.
>>
>>
>> These are the test results for 10K, 5K and 3.3K items for "data
>> MapData k v = MapData (HashMap k v)" (at the end is the full runnable
>> example.)
>>
>> Case Allocated GCs
>> HashMap 262,824 0
>> HashMap half 58,536 0
>> HashMap third 17,064 0
>> MapData 4,242,208 4
>>
>> I tested by changing the order, disabling all but one etc., and the
>> results were the same. Same 'weigh' behaviour with IntMap and Map.
>>
>>
>> So, if anyone knows and has some experience with such issues, my
>> questions are:
>>
>> 1. Is 'weigh' package reliable/usable, at least to some extent? (the
>> results do show diff between full, half and third)
>>
>> 2. How do you measure mem consumptions of your large data/records?
>>
>> 3. If the results are even approximately valid, what could cause such
>> large discrepancies with 'data'?
>>
>> 4. Is there a way to see if some record has been freed from memory, GCed?
>>
>>
>>
>> module Main where
>>
>> import Prelude
>>
>> import Control.DeepSeq (NFData)
>> import Data.HashMap.Strict (HashMap, fromList)
>> import GHC.Generics (Generic)
>> import Weigh (mainWith, value)
>>
>>
>> data MapData k v = MapData (HashMap k v) deriving Generic
>> instance (NFData k, NFData v) => NFData (MapData k v)
>>
>> full, half, third :: Int
>> full = 10000
>> half = 5000
>> third = 3333
>>
>> main :: IO ()
>> main = mainWith $ do
>> value "HashMap" ( mkHMList full)
>> value "HashMap half" ( mkHMList half)
>> value "HashMap third" ( mkHMList third)
>> value "MapData" (MapData $ mkHMList full)
>>
>> mkHMList :: Int -> HashMap Int String
>> mkHMList n = fromList . zip [1..n] $ replicate n "some text"
>>
>>
>>
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
>
More information about the Haskell-Cafe
mailing list