sequencing data structures

Wed, 20 Aug 2003 12:16:11 +0100

On Wednesday 20 August 2003 9:16 am, Bernard James POPE wrote:

> If you can live with supporting only one compiler then you can get a lot of
> this done via the FFI.
>
> In buddha, my debugger for Haskell, I have to print pretty versions of
> arbitrary data structures, which is in many ways similar to what you want
> to do.
>
> [details omitted]

This interface looks pretty similar to the interfacein Hugs  The module is 
hugs98/libraries/Hugs/Internals.hs:

http://cvs.haskell.org/cgi-bin/cvsweb.cgi/hugs98/libraries/Hugs/Internals.hs?rev=1.2&content-type=text/x-cvsweb-markup&sortby=log

It seems to me that, with a little tweaking of the Buddha and Hugs.Internals 
interfaces, we could provide the same interface on both Hugs and GHC.  (It's 
also pretty simple to implement so it could potentially be added to NHC98 as 
well.)

The major difference between the interfaces is that Buddha's thunk 
representation is a recursive datatype so the reify operation relies on 
lazyness to handle cycles and large substructures.  Instead, Hugs.Internals 
thunk representation is a non-recursive datatype and 'classifyCell' (the 
equivalent of Buddha's reify) is an operation to classify a single thunk as 
an Int, Float, application, etc. and the program must explicitly traverse 
function arguments to examine them.  This is a pretty small difference - the 
Buddha-style of interface is easily implemented in terms of the 
Hugs.Internals interface.

Another potential difference is in the semantics.  The semantics of the 
Hugs.Internals operations is difficult because they allow you to examine 
unevaluated thunks in the heap.  For example, you might look at a thunk for:

   1+2

and see any of the following:

  addInt (fromInteger <dictionary> 1) (fromInteger <dictionary> 2) 
  addInt 1 (fromInteger <dictionary> 2) 
  addInt 1 2
  3

in fact, a legal (but bizarre) Haskell compiler could also return

  4-1
  1 + (1+1)
  id 3

or anything else whose value is 3.

To reflect this in the semantics, we use the same trick that we use in the 
semantics of exception handling and use non-determinism.  That is, a thunk is 
viewed as a set of all possible representations of the same _value_ and 
classifyCell merely selects one of those representations (which is why 
classifyCell is in the IO monad).

Sadly, the resulting semantics isn't compositional since the semantics of an 
application

  e1 e2

contains both 'good' meanings based on the semantics of e1 and e2

   [ x y | x <- [[e1]], y <- [[e2]] ]

but also many expressions that have nothing to do with the semantics of e1 or 
of e2 but which just happen to evaluate to the same value (like the legal but 
bizarre reifications of 1+2 given above).

Not having a compositional semantics is frequently regarded as a Bad Thing.

--
Alastair Reid

ps Hugs.GenericPrint in the same directory contains a generic printing 
function based on Hugs.Internal that produces output identical to the old 
Hugs printing mechanism.  There's also code (Hugs.CVHAssert) to write assert 
functions where the assertions are about the size of the arguments (based on 
an idea by Cordelia Hall) and there's even some code to disassemble bytecode 
functions.

pps I doubt that these modules have been tested in several years - minor 
bitrot may well have set in.