[Haskell-cafe] Typing efficient folds

Mon Apr 27 03:35:24 EDT 2009

Keith Battocchi wrote:
> data Nest a = Nil | Cons(a, (Nest (Pair a)))
> type Pair a = (a,a)
> 
> pair f (a,b) = (f a, f b)
> 
> efold :: forall n m b. 
>   (forall a. n a) 
>   -> (forall a . (m a, n (Pair a)) -> n a)
>   -> (forall a. Pair (m a) -> m (Pair a))
>   -> (forall l z. (l b -> m (z b)) -> Nest (l b) -> n (z b))
> efold e f g h Nil = e
> efold e f g h (Cons(x, xs)) = f(h x, efold e f g (g . pair h) xs)

If the problem is just understanding the type error, it may help to
manually check the types. You only need to look at the second branch of
the function definition, since this is the case that generates the
error.

>From the pattern-match, you get the type of xs:

xs :: Nest (Pair (l b))  -- (1)

Combining the type of f with the result type of the efold definition,
you get the type of the result of the recursive call:

efold e f g (g . pair h) xs :: n (Pair (z b))  -- (2)

This function makes use of polymorphic recursion. To analyse the type of
the recursive call, you need a fresh batch of type variables, so you can
distinguish the types of the arguments to the recursive call from the
types of the formal parameters of the function definition. Since the
first three arguments to the recursive call are unchanged, you only need
fresh variables for b, l and z. I'll use primed versions of these
variables:

efold e f g :: (l' b' -> m (z' b')) -> Nest (l' b') -> n (z' b')  -- (3)

>From (3), you get an alternative view of the type of xs:

xs :: Nest (l' b')  -- (4)

You can now perform unification on these results. You need to be careful
unifying with Pair types, since (Pair a) is really ((,) a a).

Unifying (1) and (4), you get:

Nest ((,) (l b) (l b)) ~ Nest (l' b')
l' ~ (,) (l b)
b' ~ (l b)                 -- (5)

Unifying (2) and the result type from (3):

n ((,) (z b) (z b)) ~ n (z' b')
z' ~ (,) (z b)
b' ~ (z b)                 -- (6)

>From (5) and (6), you require: l b ~ z b, and therefore l ~ z, but the
"forall l z" in the type signature requires l and z to be independent.

I'm no expert in nested data types, but I doubt there is any additional
generality to be had from trying to distinguish l and z. In that case,
you can write both "l b" and "z b" as "a". You can also remove a
superfluous tuple from the type definition:

data Nest a = Nil | Cons a (Nest (Pair a))
type Pair a = (a,a)

efold :: (forall a. n a)
      -> (forall a. m a -> n (Pair a) -> n a)
      -> (forall a. Pair (m a) -> m (Pair a))
      -> (a -> m a)
      -> (Nest a -> n a)
efold e f g h Nil = e
efold e f g h (Cons x xs) = f (h x) (efold e f g (g . pair h) xs)

For example, to flatten a Nest to a simple list, you can take m to be Id
and n to be [], something like this:

nest = Cons 0 (Cons (1,2) (Cons ((3,4),(5,6)) Nil))

newtype Id a = Id a

-- [0,1,2,3,4,5,6]
flat_nest = efold []
  (\(Id xs) ys -> xs : concatMap (\(p,q) -> [p,q]) ys)
  (\(Id x, Id y) -> Id (x,y))
  Id nest

It's a little unpleasant having to use the Id type like this, but I
can't see a way around it. I'd be interested to see an example with a
non-trivial m.

Regards,
Matthew