[Haskell-cafe] Detecting Cycles in Datastructures

Sun Dec 11 09:58:01 EST 2005

Another belated reply to an old thread...

Andrew Pimlott wrote:
 > On Fri, Nov 18, 2005, Paul Hudak wrote:
 > > unwind :: Expr -> Expr
 > > unwind (Add e1 e2) = Add (unwind e1) (unwind e2)
 > > unwind (Rec fe)    = x where x = unwind (fe x)
 > > unwind e           = e
 >
 > Since this discussion started around observing sharing in the
 > implementation, I wanted to see whether, by the time we convert your
 > cunning representation back into an infinite data structure, we have
 > the sharing we hoped to observe in the first place.
 >
 > > fe2 e = Add (Const 1) e    -- recursive
 > > e2 = Rec fe2               -- top-level loop
 > > e2u = unwind e2            -- infinite
 >
 >     main = walk e4u where
 >       walk (Add e1 e2) = walk e2
 >
 > blows up, showing that we do not.  The problem with unwind seems to
 > be that the computation of the fixed point keeps calling unwind,
 > which keeps reconstructing the fixed point: ...

The first solution I gave to the original email from Tom Hawkins
doesn't blow up, but you are right that the expanded version (to deal
with nested loops) does blow up.  Actually, my original definition of
unwind was this:

 > unwind :: Expr -> Expr
 > unwind (Add e1 e2) = Add (unwind e1) (unwind e2)
 > unwind (Rec fe)    = x where x = fe x
 > unwind e           = e

Indeed this works fine on the above example -- i.e. it doesn't blow up
(in Hugs, at least).  The only reason I added the extra call to unwind
is to handle nested loops, but sadly that causes loss of sharing, as
you have pointed out.

 > I then tried
 >
 >     unwind (Rec fe) = unwind (fix fe)
 >     fix f = x where x = f x
 >
 > even though I didn't think it would work: (fix fe) would create a
 > properly sharing structure, but unwind would unshare it:
 >
 > e2u = unwind (x where x = ((\e -> Add (Const 1) e) x))
 >              (          = Add (Const 1) x)
 > = Add (Const 1) (unwind (x where x = Add (Const 1) x))
 > = Add (Const 1) (Add (Const 1) (unwind (x where x = Add (Const 1) x)))
 > = ...
 >
 > This does blow up in ghci, but not in ghc (6.4.1), even without
 > optimization.  I'm not quite sure why, ...

Interesting...  This is arguably the most elegant solution, since it
just replaces the constructor "Rec" with the function "fix".  Perhaps
surprisingly, it doesn't blow up in Hugs either!  If we unwind e2 we
get:

unwind e2
=> unwind (Rec (\e-> Add one e))
=> unwind (fix (\e-> Add one e))
=> unwind (x where x = (\e-> Add one e) x)
=> unwind (x where x = Add one x)

The question is, what happens next?  It seems to me that to prevent
blow-up the interpreter would have to do something like this:

=> x where x = Add one (unwind x)

but I don't know how this happens.  I would have expected the
unwinding that you showed above.  Perhaps someone who understands
either GHC or Hugs can explain this -- I suspect it has something to
do with the interpretation of "where".  It would be great if there
were a simple technique that one could expect every reasonable
implementation to use.

 > ... but anyway I want a version that exhibits sharing even in any
 > reasonable implementation.  Your message gives the technique (in
 > mapE); we only have to apply it to unwind.  But there is a
 > problem--your code has a bug!
 >
 > > mapE :: (Int->Int) -> Expr -> Expr
 > > mapE f e = mapE' f e 0 where
 > >   mapE' f (Const i)   n = Const (f i)
 > >   mapE' f (Add e1 e2) n = Add (mapE' f e1 n) (mapE' f e2 n)
 > >   mapE' f (Rec fe)    n = Rec (absLoop n (mapE' f (fe (Loop n)) (n+1)))
 > >   mapE' f (Loop i)    n = Loop i
 > >
 > > absLoop :: Int -> Expr -> Fix Expr
 > > absLoop n e = \e' ->
 > >    let abs (Loop n') | n==n' = e'
 > >        abs (Add e1 e2)       = Add (abs e1) (abs e2)
 > >        abs e                 = e
 > >    in abs e
 > >
 > > e4 = Rec (\e1-> Add (Const 1)
 > >                     (Rec (\e2-> Add e1 e2))) -- nested loop
 > > e7 = mapE succ e4
 > > e8 = Rec (\e1-> Add (Const 2)
 > >                     (Rec (\e2-> Add e1 e2)))
 > > b4 = e7==e8   -- returns True!
 >
 > Notice that absLoop does not look inside Rec.  But there could be a
 > Loop (with the right n) there!

Thanks for catching this, and your solution seems right to me.  See
further comments below.

 > Really, we want absLoop to eliminate all the Loops it can find.  But
 > it can't, because it only knows the replacement expression for one
 > Loop.  It would be simpler for the Loop just to contain the
 > expression.  To enable that, I added a constructor Stop that is like
 > Loop, except it takes an Expr instead of an Int.  I use this
 > constructor for my sharing unwind as well; Loop is only needed for
 > (==).  (It would probably be even better to add an annotation type
 > argument to Expr; this could enable stricter typechecking that would
 > have caught the bug.)
 >
 > Complete code:
 > > data Expr = Const Int
 > >           | Add Expr Expr
 > >           | Rec (Fix Expr)            -- implicit loop
 > >           | Loop ID                   -- not exported
 > >           | Stop Expr
 > ...

It's interesting to note that with this datatype, the original
example:

 > cyclic = Add (Const 1) cyclic

could be written as:

 > cyclic = Add (Const 1) (Stop cyclic)

instead of:

 > cyclic e = Add (Const 1) (Rec cyclic)

which raises the question of why we're bothering with Rec and
fixpoints to begin with...  Indeed, if we did just this:

 > data Expr = Const Int
 >           | Add Expr Expr
 >           | Tag ID Expr
 >           | Loop ID

Then we could encode loops *explicitly*, and also do equality checks,
proper mapping, etc.  In a sense, what your and my earlier solutions
are doing is just turning a "Rec (\x-> e)" encoding into a "Tag i e /
Loop i" encoding (and back again), where the id "i" serves the purpose
of the variable "x".  I've attached some more code below, in which I
use an environment to "unwind" expressions, making the correspondence
between ids and variables even stronger.

Indeed, the idea of a applying a function to "open up" its body, and
then doing an abstraction to recover the functional form, reminds me
of higher-order abstract syntax, where one of the difficult issues is
inspecting and manipulating terms "under a lambda".  I think that
we're treading on the same ground here.

   -Paul

---------------------------------------------------------------------

 > module CyclicE where

 > data Expr = Const Int
 >           | Add Expr Expr
 >           | Tag ID Expr
 >           | Loop ID
 >
 > type ID = Int

Eq and Show

 > instance Eq Expr where
 >   Const x == Const y        =  x == y
 >   Add e1 e2 == Add e1' e2'  =  e1 == e1' && e2 == e2'
 >   Tag i e == Tag i' e'      =  i == i'
 >   Loop i == Loop i'         =  i == i'
 >   _ == _                    =  False
 >
 > instance Show Expr where
 >   show (Const x)   =  "(Const " ++ show x  ++ ")"
 >   show (Add e1 e2) =  "(Add "   ++ show e1 ++ " " ++ show e2 ++ ")"
 >   show (Tag i e)   =  "(Tag "   ++ show i  ++ " " ++ show e  ++ ")"
 >   show (Loop i)    =  "(Loop "  ++ show i  ++ ")"

Examples:

 > e2,e3,e4 :: Expr
 > e2 = Tag 1 (Add (Const 1) (Loop 1))   -- recursive
 > e3 = Add (Const 0) e2                 -- lower-level loop
 > e4 = Tag 1 (Add (Const 1)
 >                 (Tag 2 (Add (Loop 1) (Loop 2)))) -- nested loop
 >
 > b1,b2,b3 :: Bool
 > b1 = e3==e3           -- True
 > b2 = e3==e2           -- False
 > b3 = e4==e4           -- True

MapE:

 > mapE :: (Int->Int) -> Expr -> Expr
 > mapE f (Const i)   = Const (f i)
 > mapE f (Add e1 e2) = Add (mapE f e1) (mapE f e2)
 > mapE f (Tag i e)   = Tag i (mapE f e)
 > mapE f (Loop i)    = Loop i

 > e5 = mapE succ e2
 > e6 = Tag 1 (Add (Const 2) (Loop 1))
 > b4 = e5==e6           -- True
 >
 > e7 = mapE succ e4
 > e8 = Tag 1 (Add (Const 2)
 >                 (Tag 2 (Add (Loop 1) (Loop 2))))
 > b5 = e7==e8           -- True

Unwind:

 > unwind :: Expr -> Expr
 > unwind e = unw empty e where
 >   unw env (Const i)   = Const i
 >   unw env (Add e1 e2) = Add (unw env e1) (unw env e2)
 >   unw env (Tag i e)   = e' where e' = unw (upd i e' env) e
 >   unw env (Loop i)    = fnd i env

Simple implementation of environments:

 > empty i = error ("unknown ID: " ++ show i)
 > upd i e env = \i' -> if i==i' then e else env i'
 > fnd i env = env i

 > e2u,e3u,e4u :: Expr
 > e2u = unwind e2       -- infinite
 > e3u = unwind e3       -- infinite
 > e4u = unwind e4       -- infinite

 > walk (Add e1 e2) = walk e2
 > w2 = walk e2u         -- no blow-up
 > w3 = walk e3u         -- no blow-up
 > w4 = walk e4u         -- no blow-up