[Yhc] how stable is yhc?

Fri Sep 15 07:38:06 EDT 2006

On Thursday 14 September 2006 14:40, you wrote:
> Neil Mitchell wrote:

> This is how my debugger works: Rather than logging information to a file
> as the program runs (like HOOD) or generating the redex trail as a
> separate data structure in memory (which I believe Hat does; please
> correct me if I'm wrong), we use the YHC/NHC heap itself as the
> Evaluation Dependence Tree. We change the virtual machine so that when a
> function returns, just before the root of the tree representing the
> redex is overwritten by an indirection node to the root of the result
> tree, we keep a backup copy of the word that is overwritten. Also, the
> garbage collector is disabled so that indirection nodes are not
> eliminated. In addition, the top of stack in the caller (immediately
> after the return) will be a pointer to the redex node rather than the
> result. The attached slide gives an example of some of this; please look
> at it.

Interesting, one thing you might find useful is that Yhc makes it easy to add 
extra information to heap nodes. So for example you could have something like 
this (view in fixed width font).

+-----------+----------+-----------+----------+
|    (+)    |  empty   |    3ptr   |   5ptr   |
+-----------+----------+-----------+----------+

when the value is computed it becomes

+-----------+---------+-----------+----------+
| IND to 8  |   (+)   |   3ptr    |   5ptr   |
+-----------+---------+-----------+----------+

That way you wouldn't have to save a separate 'overwritten' list (though it 
does make every heap cell one word larger).

> When the VM during evaluation (or the debugger) examines such a node (a
> node representing a function application that finished evalution), it
> follows the indirection and everything works as expected -- once a
> function is evaluted, its result is used (rather than the function being
> re-evaluated). But the debugger can look at the saved state and examine
> the redex -- the body of the function that was executed.

Yup the above scheme would also work that way :-)

> We also need a breakpoint instruction to execute step-by-step (or rather
> function-by-function) -- in effect, controlled execution by the user.

This should be straightforward to implement.

> Am I correct in assuming that if we want to leave indirection nodes as
> is, there is no work for the garbage collector to do? What about memory
> requirements once the GC is disabled? I believe that the general opinion
> is that memory can be problem, but on computers with say 512 MB memory
> is it still a problem?

Haskell tends to work by allocating lots of closures describing the 
computation to be performed and then reducing those closures to a value. The 
way it tends to work is that the closures are large and the value is small. 
So once the value has been computed the garbage collector throws away all the 
information on how to compute the value (since it's not needed) and just 
keeps the value. That value might then be an argument to another closure, 
which reduces to a value and so on.

The problem is if you want to determine whether the computation was correct 
you need an accurate record of what computation was performed, so you can't 
just throw away the 'how to compute it' information. Thus you're quite right, 
you would need to disable the GC.

The issue you're likely to come up against is that Haskell burns memory like 
nothing. Here's a table of some measurements on my AMD64 of some the 
conformance tests we use (all based on real programs submitted by Haskell 
users):

Test		Run time	Total Memory		Maximum Live

Anna		9.8s		703Mb			1.25Mb
Fluid		1.0s		64Mb			176Kb
Wheelsieve2	14.0s		904Mb			13Mb
Gamteb		11.3s		904Mb			2.9Mb

'Total Memory' is the total amount of memory allocated during the program run 
and 'Maximum Live' is the largest quantity of memory that was actually copied 
over during a GC.

As you can see Haskell programs tend to allocate a *lot* of memory and then 
throw most of it away again. So if you keep everything you can exhaust even 
gigabytes of memory really very quickly with a computationally heavy program.

Of course if you only want to debug very short running programs you should be 
okay.

> I'm not sure how Freja works; I think it keeps a copy of the EDT (as
> it's created) in memory separate from the program's heap, if you
> understand what I mean. Am I correct? In any case it runs only on SPARC,
> and so does not exactly suit my purposes.

Freja does things 'piecemeal' which is it runs for a while and gathers a bit 
of the trace in memory and then stops. Then if the user navigating the trace 
wanders outside the data Freja has collected then Freja re-runs the program 
(from the beginning) and collects the next bit of trace in memory. 

> I'm largely referring to the assumptions I made in the description above
> -- the Eval, Return & ReturnEval instructions, the indirection nodes,
> etc. I think these won't be changing, right? I don't need a Haskell
> implementation that can compile all programs, just enough non-trivial
> ones to demonstrate my work. Things like garbage collection algorithms,
> type checking, etc can change without any problem for me. I think you
> now understand what I mean when I ask about Yhc being stable.

I don't think there's anything likely to change in Yhc anytime soon that is 
going to cause you a problem :-)

Hope that helps :-)

Tom