Questions about sharing

Jan-Willem Maessen jmaessen@alum.mit.edu
Sat, 8 Dec 2001 10:55:30 -0500


On Fri, 7 Dec 2001, Adrian Hey replied:
> On Friday 07 December 2001  2:27 pm, D. Tweed wrote:
> > > On Fri, 7 Dec 2001, Adrian Hey wrote:
> > > The first is..
> > > Does the compiler keep a unique copy of expressions which consist of just
> > > a single zero arity constructor (eg. [],True,Nothing..) as a CAF which is
> > > referenced each time the constructor appears in an expression, or does it
> > > duplicate the constructor (expression) each time it's used.
> 
> Well I suppose if it's necessary to create a new indirection heap record
> for each reference, then there's not really any point in having a single
> copy of the value itself. But I don't see why that should be so. Even
> if it is indirected for some reason it should still be possible to
> share the indirection record I think.
> 
> Maybe CAF is the wrong word to use here since there's no application
> involved. What I mean is.. are zero arity constructors referenced the
> same way as a top level constant would be? (Not that I know how
> that's done, but I presume there's only 1 copy of top level
> constants in memory at any one time.)

My understanding is that GHC tries to have it both ways.  Here's my
understanding of how it works (implementors should probably chime in
if my memory is faulty or out of date):

1) There is a single, canonical copy of every nullary constructor.
  This canonical copy is used wherever possible---just like a
  top-level constant.

2) Updating a thunk with an indirection makes it expensive to obtain
  the thunk's value.  The extra indirection does not require
  allocation---the only reason we need indirections at all is to
  overwrite memory that previously held a thunk.  The real problem is
  that it takes time to chase down indirections once they exist.

  Therefore when a thunk evaluates to a nullary constructor, it is
  overwritten directly.  This effectively creates another copy of the
  nullary constructor.

3) When the GC runs, instead of copying these newly-created nullary
  constructors, it replaces them with the canonical copy.

  [The GC also eliminates indirections, and thus helps us no matter
  what we do in 2) above]

Thus we get the best of both worlds---efficient thunk update and
reasonable space consumption.

-Jan-Willem Maessen