TypeFamilies vs. FunctionalDependencies & type-level recursion

Sat Jun 25 19:08:02 CEST 2011

At Thu, 23 Jun 2011 00:40:38 -0700 (PDT),
oleg at okmij.org wrote:
> 
> 
> > How would you make this safe for dynamic loading?  Would you have to
> > keep track of all the package/module names that have been linked into
> > the program and/or loaded, and not allow duplicates?  Is dynamic
> > unloading of code ever possible?  Or at least is re-loading possible,
> > since that appears to be key for debugging web servers and such?
> 
> I must read up on Andreas Rossberg's work and check AliceML (which is,
> alas, no longer developed) to answer the questions on safe dynamic
> loading. AliceML is probably the most ambitious system to support
> type-safe loading. See, for example
> 
> http://www.mpi-sws.org/~rossberg/papers/Rossberg%20-%20The%20Missing%20Link.pdf
> 
> and other papers on his web page. He now works at Google, btw.

I wish I knew ML better, but from looking at that paper, I can't
figure out the key piece of information, which is how to detect
overlapping type instance declarations.  (Does ML even have something
equivalent?)

In the absence of type families, I'm okay using something like
haskell-plugins that returns Dynamic.  In the absence of type
families, Safe Haskell ought to be able to guarantee that typereps are
all unique, making cast safe in such a situation.  (Even if somehow
two different pieces of code end up with conflicting functional
dependencies, this may muck with the expected behavior of the program,
as I get a different method from the one I was expecting, but it won't
undermine type safety.)  If there were a way to use Baars & Swiestra
instead of Dynamic, that would be even better.

With type families, I'm at a loss to figure out how this could even
work.  You would need to know have a runtime representation of every
type family that's ever been instantiated, and you would need to check
this against all the type families in the dynamic module you are
loading.  That seems very complicated.

> MetaOCaml by not doing anything about it, letting the garbage code
> accumulate. It is hard to know when all the code pointers to a DLL
> are gone. GC do not usually track code pointers. One has to modify
> GC, which is quite an undertaking.  I suppose one may hack around by
> registering each (potential) entry point as a ForeignPointer, and do
> reference counting in a finalizer. Since a DLL may produce lots of
> closures, each code pointer in those closures should be counted as
> an entry point.

It would be very cool to GC the code, but that's more than I'm hoping
for.

> > Finally, how do you express constraints in contexts using type
> > families?  For example, say I wanted to implement folds over tuples.
> > With fundeps (and undecidable instances, etc.), I would use a proxy
> > type of class Function2 to represent the function:
> 
> Your code can be re-written to use type functions almost mechanically
> (the method definitions stay the same -- modulo uncurrying, which I used
> for generality). I admit there is a bit of repetition (repeating the
> instance head). Perhaps a better syntax could be found.

This is great.  And in fact I was almost sold on your idea, until I
realized that GHC does not accept the following program:

  {-# LANGUAGE TypeFamilies #-}

  x :: ()
  x = const () (eq a b)
      where
        a :: S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(Z)))))))))))))))))))))
        a = undefined
        b :: S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(Z)))))))))))))))))))))
        b = undefined
        eq :: a -> b -> NatEq a b
        eq _ _ = undefined

Notice that there are no undecidable instances required to compile
this function (or the portions of your TTypeable that this function
depends on).  Yet GHC is still limiting my context stack (to 21 by
default).  Obviously any kind of unicode encoding of type names is
going to run into the same sort of problem.

So we seem to have come full circle:  Undecidable instances are
problematic because program compilation depends on this arbitrary
context stack depth parameter.  One way to avoid UndecidableInstances
is to assign a unique type-level Nat to each type.  But now the
comparison of types is depending on this context stack and likely
worsening the problem.

A few thoughts:

First, obviously if we went base 2 instead of unary for the encoding,
types would get a lot shorter.  If we took a SHA-256 hash of the type
name and encoded it, we could then pick some default context depth
(e.g., 21 + 256) that would make this work most of the time.

Second, I believe it is kind of unfair of GHC to reject the above
program.  Without UndecidableInstances, GHC ought to be able to figure
out that the type checker will terminate.  Moreover, in the real
world, GHC only has a finite amount of time and memory, so there is no
difference between divergence and really big types.  I can already
crash GHC by feeding it a doubly-exponential-sized type such as:

	a = let f0 x = (x, x)
		f1 x = f0 (f0 x)
		f2 x = f1 (f1 x)
		f3 x = f2 (f2 x)
		f4 x = f3 (f3 x)
		f5 x = f4 (f4 x)
	    in f5 ()

So given that decidable inputs already exist on which GHC wedges my
machine for a while and then crashes, what's the point of artificially
limiting compilation of other types of programs whose inputs aren't
known to be decidable, since often they are in fact decidable?

Finally, I still think most of the "magic" in everything we've been
talking about boils down to being able to have a type variable that
can take on any type *except* a certain handful.  This would allow us
to avoid overlapping instances of both classes and type families,
would allow us to define TypeEq, etc.  Moreover, it would be a more
direct expression of what programmers intend, and so make code easier
to read.  Such a constraint cannot be part of the context, because
instance selection would need to depend on it, so some new syntax
would be necessary.

David