Expiring cached data?

Tue Nov 4 20:48:44 EST 2003

G'day all.

On Sun, 2 Nov 2003 22:58:41 -0300 (CLST)
"andrew cooke" <andrew at acooke.org> wrote:

> > What options do I have in Haskell?  I'm interested both in general
> > solutions (maybe some compilers do this anyway?) and in approaches
> > to structuring the program so that I can control caching in more
> > detail.

It sounds to me like separating out the reclamation algorithm is going
to go better for you, so you can play with different approaches.

Quoting Ben Escoto <bescoto at stanford.edu>:

> However, once in a while (every N reads?), the tree could be rebuilt
> based on the data stored in the other data structure.  Leaves that you
> want to expire could be recreated, and the leaves you want to save
> could be copied directly from the old tree.  The new tree would be
> returned, and the cached stuff you don't want would be garbaged
> collected.

I think that periodically rebuilding the cache is almost certainly
the best approach.  However, I would caution against LRU-based policies,
because the work pessimally when your working set is just larger than
the size of your cache.

One approach that I used once was that when my data structure reached
a certain size, I would randomly drop N% (in my application, N == 50)
of the items in it.  This greatly simplifies bookkeeping, as you only
need to keep track of the number of items in the cache, which you
usually get for free anyway.  If recomputation isn't _that_ expensive,
this is a good approach.

Another possibility is to use a "not recently used" algorithm.  You record
the last, say, 10% of accesses, keep those, and randomly eject 50% of what
remains.  If recent entries are very precious, this can be a good
compromise.

If you can wait a day or two, I have some code which needs to be cleaned
up which does pretty much this.

Cheers,
Andrew Bromage