[nhc-users] Doubts in bytecode

Thu Mar 17 06:13:03 EST 2005

Arunkumar S Jadhav <arunk at it.iitb.ac.in> writes:

> Now in between all these contents of stack that were pointing
> to two graphs (i.e x+y and x-y) are being replicated on the stack and then
> one of the copies (of both the graphs) is being zapped.

Yes, it is curious.  I think the main reason is to swap the order of
the values, so that the application of f is correct, i.e.
    f (x+y) (x-y)
rather than
    f (x-y) (x+y)
But I am also puzzled why the original copies of (x+y) and (x-y)
remain on the stack, and why those stack entries are zapped.

> Also what all are the uses of ZAP nodes apart from black hole detection.
> Do ZAP nodes help in garbage collection too ?

In theory the GC could recover all the space in a zapped heap node
apart from the first pointer (which will eventually be overwritten with
an indirection to the final result).  However, the nhc98 collector
does not currently do this, so I believe at the moment the ZAP bit
is only used for black hole detection.

> Q2) As Malcolm explained in detail this is the purpose of CONSTR macro
> 
>  CONSTR(c,s,ws)
>         Construct a tag (i.e. a header for a data node) where there is
>         a mixture of pointers and basic values amongst the data items

It seems I was almost right in this description, but mixed up the
pointers/non-pointers.

		s  = size = total number of data items in the node
		ws = number of data items which are pointers to
		     other nodes
		The number of non-pointers is therefore (s - ws).
should read:
		s  = size = total number of data items in the node
		ws = number of basic data items (non-pointers)
		The number of pointers is therefore (s - ws).

> I compiled various examples but till now I haven't seen a single example
> where CONSTR is used for a mixture of pointers and basic values. It has
> always been for basic values.

In fact, every example has only /pointers/, with no basic data values.
This is because basic data values in a polymorphic lazy language are
nearly always represented as a heap pointer to the value ("boxed"),
which is stored separately.  The only case in which the basic value can
be "in-lined" in a data structure, is when it is explicitly "unboxed"
by the programmer (or implicitly "unboxed" by an optimising compiler).

In the GHC compiler, for instance, unboxed values are marked in the
source code with a # symbol, like this example on the GHC mailing
list today:

    forn :: a -> Int# -> IO ()
    forn a n | n >=# 10000# = return ()
             | otherwise    = fory a 0# >> forn a (n +# 1#)

You can see that not only the literal numeric values are unboxed,
but their type is different, and operations on unboxed values are
also marked with a #, because their code must be different from the
standard boxed versions.

nhc98 has some rudimentary support for unboxed values, which is why the
CONSTR macro allows to specify how many fields of the data structure
are unboxed.  However, I believe this compiler support was never
completed by the original author, because the parser does not accept
the # marks.

There is one hand-written file in the runtime system that actually
uses unboxed values - src/runtime/Builtin/cPack.c - but I don't think
the functions defined there are imported into nhc98's libraries,
so it is essentially dead code at the moment.

Regards,
    Malcolm