[Haskell-cafe] memory usage: general questions

Thu Aug 4 01:51:01 UTC 2016

On 08/03/2016 12:06 AM, Jonas Scholl wrote:
>>
>> 3) Are the "overwritten" portions of each B (i.e., those parts of the
>> persistent data we no are no longer needed) guaranteed to be pushed out
>> of memory at any particular point? Is this something we know will
>> (might?) happen as soon as the GC is run? Or does it happen right when
>> each f is applied to it's argument?
>>
> 
> Nothing is overwritten. Instead, a new version of B is allocated and the
> old one is collected after the last reference is dropped. Maybe this is
> best visible if you consider (f b, g b). Overwriting parts of b in f
> would change the argument to g. Thus, a copy has to be produced. It
> would be valid if the compiler could prove that only one reference
> exists and this is passed to f, but first this proof could be difficult
> and second a separate version of f would have to be compiled. This may
> even reduce performance, as the GC is exploiting the fact that most data
> structures are immutable (although thunks are mutable as they are
> overwritten by their result). Thus, such an optimization could hurt the
> GC, which would result in a global slowdown, further reducing any gains
> from overwriting the contents of the argument. Additionally, GCs are
> quite cheep if not much data is alive.
> 
> 

Thank you for the thorough and interesting response to my first two
questions. I am hoping I might get additional clarification on this
third question.

Please forgive the use of the term "overwritten". Of course, the data
structures are immutable. However, I was of the understanding that,
through the use of persistent data structures, it was not necessary for
the entire data structure to copied, if only part of the structure changes.

Suppose I have a list data structure possessing millions of nodes and
occupying 100MB of memory. And then suppose that the update function
"drops" (speaking metaphorically!) two nodes off the front of the
original list, and then prepends a new node to the front of that. Most
of the original listed can be reused, since the new list can simply
point to it after the first node. But what about those two nodes that
were "dropped" from the list, and never used by anything else in the
program. My question was: will they immediately be deallocated in memory
right after the application of the update function? Or are they cleaned
up later by the GC? And can we be sure that it will be on the next run
of the GC, or could they be hanging around it memory for a while,
depending on other factors?

-- 
https://qlfiles.net
To protect my privacy, please use PGP encryption. It's free and easy
to use! My public key ID is 0x340EA95A (pgp.mit.edu).