sequencing data structures
Alastair Reid
alastair@reid-consulting-uk.ltd.uk
Wed, 20 Aug 2003 12:16:11 +0100
On Wednesday 20 August 2003 9:16 am, Bernard James POPE wrote:
> If you can live with supporting only one compiler then you can get a lot of
> this done via the FFI.
>
> In buddha, my debugger for Haskell, I have to print pretty versions of
> arbitrary data structures, which is in many ways similar to what you want
> to do.
>
> [details omitted]
This interface looks pretty similar to the interfacein Hugs The module is
hugs98/libraries/Hugs/Internals.hs:
http://cvs.haskell.org/cgi-bin/cvsweb.cgi/hugs98/libraries/Hugs/Internals.hs?rev=1.2&content-type=text/x-cvsweb-markup&sortby=log
It seems to me that, with a little tweaking of the Buddha and Hugs.Internals
interfaces, we could provide the same interface on both Hugs and GHC. (It's
also pretty simple to implement so it could potentially be added to NHC98 as
well.)
The major difference between the interfaces is that Buddha's thunk
representation is a recursive datatype so the reify operation relies on
lazyness to handle cycles and large substructures. Instead, Hugs.Internals
thunk representation is a non-recursive datatype and 'classifyCell' (the
equivalent of Buddha's reify) is an operation to classify a single thunk as
an Int, Float, application, etc. and the program must explicitly traverse
function arguments to examine them. This is a pretty small difference - the
Buddha-style of interface is easily implemented in terms of the
Hugs.Internals interface.
Another potential difference is in the semantics. The semantics of the
Hugs.Internals operations is difficult because they allow you to examine
unevaluated thunks in the heap. For example, you might look at a thunk for:
1+2
and see any of the following:
addInt (fromInteger <dictionary> 1) (fromInteger <dictionary> 2)
addInt 1 (fromInteger <dictionary> 2)
addInt 1 2
3
in fact, a legal (but bizarre) Haskell compiler could also return
4-1
1 + (1+1)
id 3
or anything else whose value is 3.
To reflect this in the semantics, we use the same trick that we use in the
semantics of exception handling and use non-determinism. That is, a thunk is
viewed as a set of all possible representations of the same _value_ and
classifyCell merely selects one of those representations (which is why
classifyCell is in the IO monad).
Sadly, the resulting semantics isn't compositional since the semantics of an
application
e1 e2
contains both 'good' meanings based on the semantics of e1 and e2
[ x y | x <- [[e1]], y <- [[e2]] ]
but also many expressions that have nothing to do with the semantics of e1 or
of e2 but which just happen to evaluate to the same value (like the legal but
bizarre reifications of 1+2 given above).
Not having a compositional semantics is frequently regarded as a Bad Thing.
--
Alastair Reid
ps Hugs.GenericPrint in the same directory contains a generic printing
function based on Hugs.Internal that produces output identical to the old
Hugs printing mechanism. There's also code (Hugs.CVHAssert) to write assert
functions where the assertions are about the size of the arguments (based on
an idea by Cordelia Hall) and there's even some code to disassemble bytecode
functions.
pps I doubt that these modules have been tested in several years - minor
bitrot may well have set in.