[Haskell-cafe] Chuch encoding of data structures in Haskell

Thu May 27 13:10:01 EDT 2010

2010/5/27 Günther Schmidt <gue.schmidt at web.de>:
> I'm exploring the use of church encodings of algebraic data types in
> Haskell.
> Since it's hard to imagine being the first to do so I wonder if folks here
> could point me to some references on the subject.
>
> I'm looking for examples of church encodings in Haskell a little bit beyond
> Church Booleans and Church Numerals.

The fully general description of Church encoding is rather simple, but
I've rarely seen it described explicitly. Consider the type of Church
encodings for Bool, Either, and the 2-tuple (written here as "Pair"
for clarity):

churchedBool :: t -> t -> t
churchedEither :: (a -> t) -> (b -> t) -> t
churchedPair :: (a -> b -> t) -> t

And compare the signatures for the constructors of the non-Church encoded types:

True :: Bool
False :: Bool
Left :: a -> Either a b
Right :: b -> Either a b
Pair :: a -> b -> Pair a b

We can observe two patterns: 1) The Church encodings take as many
arguments as the type has constructors 2) The type of each argument is
the same as the signature of a constructor, except returning an
arbitrary type. As this suggests, "Church decoding" is as simple as
applying the Church encoded type to each of the constructors.

>From the above, a general description of Church encoding can be
deduced: The encoding of a value is a function that replaces each data
constructor with an arbitrary function. The Church encoding
represents, in a way, the most generalized means of using values of
that type--which is why Haskell already includes variations of "Church
encode" functions for a few standard types, like so:

encodeEither x = \f g -> either f g x
encodeMaybe x = \z f -> maybe z f x
encodeTuple x = \f -> uncurry f x
encodeBool x = \t e -> if x then t else e

But what of Church numerals? First, we must consider the
Church-encoding of recursive data types. Given arbitrary nested data
types, there's nothing else that can be done--the outer types know
nothing of the types they contain. But if an inner type is known to be
the same as the outer type, there are two options for the encoding:
Work only with the outermost value, as with non-recursive types, or
work with the recursive value as a whole, by having the outermost
value pass its arguments inward.

Now, consider the signature of a Church numeral:

churchedNumeral :: (t -> t) -> t -> t

Given the above, what can we say about the equivalent "decoded" data
type? It takes two arguments, so we have two constructors. The second
argument is a single value, so the associated constructor is nullary.
The first argument must be associated with a unary constructor, but
look at the parameter it takes: the same type as the result! This is
how we can tell that Church numerals are the encoding of a recursive
type. Since the only way a recursive constructor can do anything with
the values it contains is to pass its arguments inward, the value it
has to work with is the result of doing so. Thus, the other
constructor must take a single argument of its own type. What does
this look like as a standard data type?

data Nat = S Nat | Z

Good old inefficient Peano numbers, of course!

Keeping all that in mind, consider a List type, and both ways of encoding it:

data List a = Nil | Cons a (List a)

churchedListOuter :: t -> (a -> ___ -> t) -> t
churchedListRecursive :: t -> (a -> t -> t) -> t

For the "outermost" method, what type belongs in place of the ___? The
second argument of Cons is itself a List, so naively we would like to
simply put the type of "churchedListOuter" itself in place of ___, but
that won't work. In fact, nothing will work here, because recursion on
an "outermost encoded" list is impossible in a typed λ-calculus
without some means of using general recursion provided as a primitive
operation. Nested tuples can be used, but the length of the list will
be reflected in the type, preventing polymorphism over arbitrary
lists. This is also why Church encoding is most often seen in the
setting of the untyped λ-calculus.

The implicitly recursive encoding, however, presents no such problems.
So, perhaps a function to Church encode lists would be useful? Indeed
it would, and as before, it already exists:

encodeList x = \z f -> foldr f z x

Recall the earlier observation that "decoding" involves applying the
encoded type to its equivalent constructors; the same holds true for
recursive types, as demonstrated by right-folding a list with (:) and
[], or applying a Church numeral to 0 and (+ 1).

Regarding recursive data types as the least fixed point of a
non-recursive data type is thus tied to replacing the "outermost"
encoding with the "recursive" encoding, and the "Church encode"
function for a recursive type is simply a generalized right fold,
partially applied.

Now, the descriptions above are rather informal, and ignore the
possibility of infinite lazy recursive data structures, among other
gaps; but perhaps will help to get you started regardless.

- C.