[Haskell] Smash along your boilerplate; how to traverse a non-existent term

oleg at pobox.com oleg at pobox.com
Tue Jun 5 03:09:02 EDT 2007


We describe a generalization of the Smash-your-boilerplate generic
programming approach to extensible traversal strategies. To the
pre-defined gmap, reduction, and two-terms-in-lockstep strategies, the
programmer may at any time add a new class of traversals. The present
message shows one such extension, lazy gmap, and its application to
mapping of terms that do not exist. This lets us implement gminimum
and gmaximum -- which traverse `undefined' and produce the
fully-defined smallest (resp. largest) term of a desired type:

	gminimum () :: (Maybe Int,Either Bool (Maybe Char))
	 ==> (Nothing,Left False)
	gmaximum () :: (Maybe Int,Either Bool (Maybe Char))
	 ==> (Just 2147483647,Right (Just '\1114111'))

The result of traversing  (undefined::(Maybe Int,Either Bool (Maybe Char))) 
is (Nothing,Left False). This meaningful traversal of non-existing terms 
is another benefit of Haskell's non-strictness.

The complete code is available at
	http://darcs.haskell.org/generics/comparison/SmashA/
The code includes sample applications and examples, within the generic
programming comparison framework by Alexey Rodriguez Yakushev, Alex
Gerdes, and Johan Jeuring. The code is tested with GHC 6.4.1 and 6.6.


The Smash approach is based on the typecase-like function stapply
defined at the beginning of Syb4A.hs -- which expresses (or, reifies)
the typeclass selection in Haskell itself:
	stapply :: spec -> a -> d -> w
Given a heterogeneous list |spec| of functions |ai->wi| (where |ai|
and |wi| vary from one function to another in the list) and a datum of
the type |a|, we check to see if there is a function in that list
whose argument type |ai| is equal to |a|. If so, we apply that
function to the datum and return its result of the type |wi|.  If no
such function is found, we return the supplied default value, of the
type |d|. Thus the result type |w| of stapply is a (type-level)
function of the types |spec|, |a|, and |d|. The function stapply is
dual to typeclasses: for example, the following typeclass method

> class C a where fn :: a -> Int
> instance C Bool where fn x = if x then 10 else 20
> instance C Char where fn x = fromEnum x

is dually

> testtyc_fn x = 
>     stapply ((\ (x::Bool) -> if x then 10 else 20) :+:
>              (\ (x::Char) -> fromEnum x) :+:
> 	      HNil) 
> 	      ((error "no match")::Int) x

(One may say that testtyc_fn corresponds to a closed typeclass. That
is true in the above definition. However, HLists may be open too).

Our most general generic function is |gapp|, which applies a generic
function to a term. A generic function is (quite literally) made of
two parts.  First, there is a term traversal strategy, identified by a
_label_ |tlab|. One strategy may be to `reduce' a term using a
supplied reducing function (cf. fold over a tree). Another strategy
may rebuild a term. The second component of a generic function 
is |spec|, the list of `exceptions'. Each component of |spec| is a
function that tells how to transform a term of a specific
type. Exceptions override the generic traversal.

The function |gapp| is defined generically. The code below says:
first, check to see if any of the exceptions apply. If not, do the
generic traversal.

> class (STApply spec a df w, LDat tlab spec a df)
>     => GAPP tlab spec a df w | tlab spec a -> df w where
>     gapp :: tlab -> spec -> a -> w
>
> instance (STApply spec a df w, LDat tlab spec a df)
>     => GAPP tlab spec a df w where
>     gapp tlab spec x = stapply spec x (gin tlab spec x)

The function |gin| does a generic traversal. It (may) invoke |gapp| on
the children of the term, if any (and if the traversal strategy calls
for traversing the children). This class is similar to the class Data
of SYB:

> class LDat tlab spec x w | tlab spec x -> w where
>     gin :: tlab -> spec -> x -> w

However, different traversal strategies of Data are methods of that
class: gfold, gunfold, gmapT, etc. Here, different strategies are
identified by the label |tlab| and so the set of strategies is
_extensible_.  In SYB, for each new data type we need to define an
instance of Data with several methods. Here, for each new data type we
need to define several instances of LDat with one method each.

The extensible strategies were discovered when generalizing the
classes Dat and TDat in the original Smash approach. The file Syb4A.hs
defines the following popular strategies

   - Reconstruct the term: used in generic mapping
> data TL_recon = TL_recon

   - Reduce the term: used in folding over a term
> newtype TL_red w = TL_red ([w]->w)

   - The same as above but the reducer accepts the name of the term's 
     constructor, a string. This is used for generic show
> newtype TL_red_ctr w = TL_red_ctr (String -> [w]->w)

   - Traverse and reduce two terms in lock-step: we traverse the term 
     (Couple x y) where x and y have the same type, and reduce the 
     result using the supplied function 'f'. This is used for generic equality
> data Couple a = Couple a a
> data TL_red_lockstep w = TL_red_lockstep w ([w] -> w)


The TL_recon strategy is the generalization of fmap. Here are a few
instances, for a primitive type and products and recursive sums
(list). The analogy with fmap should be obvious. It should also be
obvious that the instances below can be automatically generated from
the definition of a datatype. That is, LDat instances could be
automatically derived. 

> instance LDat TL_recon spec Int Int where
>     gin _ spec x = x
> instance (GAPP TL_recon spec a dfa wa, GAPP TL_recon spec b dfb wb)
>     => LDat TL_recon spec (a,b) (wa,wb) where
>     gin tlab spec (x,y) = (gapp tlab spec x, gapp tlab spec y)
> instance (GAPP TL_recon spec [a] [w] [w], GAPP TL_recon spec a dfa w)
>     => LDat TL_recon spec [a] [w] where
>     gin tlab spec []     = []
>     gin tlab spec (x:xs) = (gapp tlab spec x):(gapp tlab spec xs)

We can define the generic gmap in one line:
> gmap f = gapp TL_recon (f :+: HNil) 

It takes a function |f::a->b| and a term and returns a term with all
values of the type |a| replaced with the corresponding values of the
type |b|. The type of the result is computed. When the input term is a
list, gmap is the ordinary map.

Given a term
> termc = (["ab"],(Just 'c', Just 'd'))

we can replace all Chars with their Int equivalents
> testc1 = gmap ord termc where ord (c::Char) = fromEnum c
	*SmashA.Syb4A> testc1
	([[97,98]],(Just 99,Just 100))

Alternatively, we can replace only those Chars that occur in Maybe Char
> testc2 = gmap maybeord termc 
>     where maybeord (Just (c::Char)) = Right (fromEnum c)
> 	    maybeord Nothing = Left 0
	*SmashA.Syb4A> testc2
	(["ab"],(Right 99,Right 100))

Unlike the regular fmap, we can fuse two traversals
> testc3 = gapp TL_recon (ord :+: maybeord :+: HNil) termc
>     where maybeord (Just (c::Char)) = Right (fromEnum c)
> 	    maybeord Nothing = Left 0
> 	    ord (c::Char) = fromEnum c

We can even traverse and replace under functions:
> testt3 = gmap (\x -> if x then 'a' else 'b') 
>               ([not], (True,('2',[(&&),(||)])))
whose inferred type is
	*SmashA.Syb4A> :t testt3
	testt3 :: ([Bool -> Char], (Char, (Char, [Bool -> Bool -> Char])))

The source code shows equality on nested data types and various
reductions (summing all Ints in a term; finding the size of the term;
printing the term, treating Strings differently from other lists,
etc).


The programmer may introduce a new traversal strategy. For
illustration, let us define a lazy version of the strategy TL_con. The
motivation is determining the smallest or the largest term of a given
type. The first stab at such a generic minimum may be
	gmin x = gmap (\ (x::Bool) -> False) x

which replaces all Bools with the smallest Bool in a sample term x. For
example, |gmin (True,False)| should give (False,False). The mapping
function is clearly non-strict in its argument, so the same result can
be achieved by applying gmin to the sample term 
(undefined::Bool,undefined::Bool). The sample term merely gives the
structure of the desired type -- the structure to traverse. 

One may observe that we don't even need that! We can apply gmin to the
sample term that is just (undefined::(Bool,Bool)) -- hence we don't
need to supply any real sample term. The key is that 
(fst (undefined::(Bool,Bool))) is a correct expression, which evaluates to
(undefined::Bool). Hence, "undefined" is deconstructible and
traversable, standing for a term of any structure we may want. It can
be traversed and mapped with real results -- providing the mapping and
traversal functions are not strict.

The TL_recon strategy above is strict, however, which is apparent from
the instance of LDat for pairs:
>     gin tlab spec (x,y) = (gapp tlab spec x, gapp tlab spec y)

That is easy to fix: just replace (x,y) with an irrefutable match ~(x,y).
Alas, the problem is more complex in the case of sums (cf the instance
LDat for lists above):
>     gin tlab spec []     = []
>     gin tlab spec (x:xs) = (gapp tlab spec x):(gapp tlab spec xs)
If we replace patterns with irrefutable match ~[] and ~(x:xs), then
the first alternative will always match, and so the second clause will
never be effective. We should recognize that sums give us a choice. We
should propagate this choice back to the users: we should report the
available alternatives, and let the users make the choice. We arrive
thus at the following design, implemented in Syb4ABuild.hs in the DARCS
repository above. We define a new traversal label

> newtype TL_recon_lazy = TL_recon_lazy (forall a. [a] -> a)

where the argument of TL_recon_lazy is a user-supplied choice function.

The instance for pairs is simple as we don't have any choice here:
> instance (GAPP TL_recon_lazy spec a dfa wa, 
> 	  GAPP TL_recon_lazy spec b dfb wb)
>     => LDat TL_recon_lazy spec (a,b) (wa,wb) where
>     gin tlab@(TL_recon_lazy f) spec ~(x,y) =
> 	f [(gapp tlab spec x, gapp tlab spec y)]

We do have the choice for sums, e.g., lists:

> instance (GAPP TL_recon_lazy spec [a] [w] [w], 
> 	    GAPP TL_recon_lazy spec a dfa w)
>     => LDat TL_recon_lazy spec [a] [w] where
>     gin tlab@(TL_recon_lazy f) spec x =
> 	f [[],
> 	   (gapp tlab spec (head x)):(gapp tlab spec (tail x))]

The term to traverse may be either an empty list, or non-empty
list. In the latter case, we speculatively map its head and
tail. We let the user make the choice. Due to non-strictness, we don't
actually traverse anything unless it is required. This lets us define
generic minimum simply as

> __ = error "nonexistent"
> gminimum () = r 
>   where r = gapp (TL_recon_lazy head) lst (__ `asTypeOf` r)
>         lst = (mb (__::Int)) :+: (mb (__::Bool)) :+: (mb (__::Char))
> 	        :+: HNil
>         mb :: Bounded t => t -> t -> t
> 	  mb _ _ = minBound

The definition for gmaximum is similar, modulo replacing minBound with
maxBound and head with last. We thus obtain the results mentioned at
the beginning. Other examples:

> test_gmin3 = gminimum () :: (Maybe Int,[Bool])
	*SmashA.Syb4ABuild> test_gmin3
	(Nothing,[])
> test_gmax1 = take 5 $ gmaximum () :: [Bool]
	*SmashA.Syb4ABuild> test_gmax1
	[True,True,True,True,True]
The presence of "take 5" is telling. The largest term of the type
[Bool] is the infinite list of True -- which gmaximum faithfully computes.

The function gminimum is akin to the de-typechecker since both convert
`undefined' to `defined'. De-typechecker produces only polymorphic
functions, whereas gminimum yields data (including monomorphic
functions).





More information about the Haskell mailing list