ANN: unification-fd 0.7.0

Mon Mar 19 08:02:48 CET 2012

--------------------------------------------
-- unification-fd 0.7.0
--------------------------------------------

The unification-fd package offers generic functions for single-sorted 
first-order structural unification (think of programming in Prolog, or 
of the metavariables in type inference)[1][2]. The library *is* 
sufficient for implementing higher-rank type systems a la [Peyton Jones, 
Vytiniotis, Weirich, Shields], but bear in mind that unification 
variables are the metavariables of type inference--- not the type-variables.

An effort has been made to make the package as portable as possible. 
However, because it uses the ST monad and the mtl-2 package it can't be 
H98 nor H2010. However, it only uses the following common extensions 
which should be well supported[3]:

     Rank2Types
     MultiParamTypeClasses
     FunctionalDependencies -- Alas, necessary for type inference
     FlexibleContexts       -- Necessary for practical use of MPTCs
     FlexibleInstances      -- Necessary for practical use of MPTCs
     UndecidableInstances   -- For Show instances due to two-level types

--------------------------------------------
-- Changes (since 0.6.0)
--------------------------------------------

This release is another major API breaking release. Apologies, but 
things are a lot cleaner now and hopefully the API won't break again for 
a while. The biggest change is that the definition of terms has changed 
from the previous:

     data MutTerm v t
         = MutVar  !v
         | MutTerm !(t (MutTerm v t))

To the much nicer:

     data UTerm t v
         = UVar  !v
         | UTerm !(t (UTerm t v))

The old mnemonic of "mutable terms" was inherited from the code's 
previous life implementing a logic programming language; but when I was 
playing around with implementing a type checker I realized that the 
names don't really make sense outside of that original context. So the 
new mnemonic is "unification terms". In addition to being a bit shorter, 
it should help clarify the separation of concerns (e.g., between 
unification variables vs lambda-term variables, type variables, etc.).

The swapping of the type parameters is so that UTerm can have instances 
for Functor, Monad, etc. This change should've been made along with the 
re-kinding of variable types back in version 0.6.0, since the UTerm type 
is the free monad generated by t. I've provided all the category 
theoretic instances I could imagine some plausible reason for wanting. 
Since it's free, there are a bunch more I haven't implemented since they 
don't really make sense for structural terms (e.g., MonadTrans, 
MonadWriter, MonadReader, MonadState, MonadError, MonadCont). If you can 
come up with some compelling reason to want those instances, I can add 
them in the future.

Since the order of type parameters to BindingMonad, UnificationFailure, 
Rank, and RankedBindingMonad was based on analogy to the order for 
terms, I've also swapped the order in all of them for consistency.

I've removed the eqVar method of the Variable class, and instead added 
an Eq superclass constraint. Again, this should've happened with the 
re-kinding of variables back in version 0.6.0. A major benefit of this 
change is that now you can use all those library functions which require 
Eq (e.g., many of the set-theoretic operations on lists, like (\\) and 
elem).

I've added new functions: getFreeVarsAll, applyBindingsAll, freshenAll; 
which are like the versions without "All", except they're lifted to 
operate over Foldable/Traversable collections of terms. This is crucial 
for freshenAll because it allows you to retain sharing of variables 
among the collection of terms. Whereas it's merely an optimization for 
the others (saves time for getFreeVarsAll, saves space for 
applyBindingsAll).

The type of the seenAs function has also changed, to ensure that 
variables can only be seen as structure rather than as any UTerm.

Thanks to Roman Cheplyaka for suggesting many of these changes.

--------------------------------------------
-- Description
--------------------------------------------

The unification API is generic in the type of the structures being 
unified and in the implementation of unification variables, following 
the two-level types pearl of Sheard (2001). This style mixes well with 
Swierstra (2008), though an implementation of the latter is not included 
in this package.

That is, all you have to do is define the functor whose fixed-point is 
the recursive type you're interested in:

     -- The non-recursive structure of terms
     data S a = ...

     -- The recursive term type
     type PureTerm = Fix S

And then provide an instance for Unifiable, where zipMatch performs one 
level of equality testing for terms and returns the one-level spine 
filled with pairs of subterms to be recursively checked (or Nothing if 
this level doesn't match).

     class (Traversable t) => Unifiable t where
         zipMatch :: t a -> t b -> Maybe (t (a,b))

The choice of which variable implementation to use is defined by 
similarly simple classes Variable and BindingMonad. We store the 
variable bindings in a monad, for obvious reasons. In case it's not 
obvious, see Dijkstra et al. (2008) for benchmarks demonstrating the 
cost of naively applying bindings eagerly.

There are currently two implementations of variables provided: one based 
on STRefs, and another based on a state monad carrying an IntMap. The 
former has the benefit of O(1) access time, but the latter is plenty 
fast and has the benefit of supporting backtracking. Backtracking itself 
is provided by the logict package and is described in Kiselyov et al. 
(2005).

In addition to this modularity, unification-fd implements a number of 
optimizations over the algorithm presented in Sheard (2001)--- which is 
also the algorithm presented in Cardelli (1987).

* Their implementation uses path compression, which we retain. Though we 
modify the compression algorithm in order to make sharing observable.

* In addition, we perform aggressive opportunistic observable sharing, a 
potentially novel method of introducing even more sharing than is 
provided by the monadic bindings. Basically, we make it so that we can 
use the observable sharing provided by the modified path compression as 
much as possible (without introducing any new variables).

* And we remove the notoriously expensive occurs-check, replacing it 
with visited-sets (which detect cyclic terms more lazily and without the 
asymptotic overhead of the occurs-check). A variant of unification which 
retains the occurs-check is also provided, in case you really need to 
fail fast.

* Finally, a highly experimental branch of the API performs *weighted* 
path compression, which is asymptotically optimal. Unfortunately, the 
current implementation is quite a bit uglier than the unweighted 
version, and I haven't had a chance to perform benchmarks to see how the 
constant factors compare. Hence moving it to an experimental branch.

These optimizations pass a test suite for detecting obvious errors. If 
you find any bugs, do be sure to let me know. Also, if you happen to 
have a test suite or benchmark suite for unification on hand, I'd love 
to get a copy.

--------------------------------------------
-- Notes and limitations
--------------------------------------------

[1] At present the library does not appear amenable for implementing 
higher-rank unification itself; i.e., for higher-ranked metavariables, 
or higher-ranked logic programming. To be fully general we'd have to 
abstract over which structural positions are co/contravariant, whether 
the unification variables should be predicative or impredicative, as 
well as the isomorphisms of moving quantifiers around. It's on my todo 
list, but it's certainly non-trivial. If you have any suggestions, feel 
free to contact me.

[2] At present it is only suitable for single-sorted (aka untyped) 
unification, a la Prolog. In the future I aim to support multi-sorted 
(aka typed) unification, however doing so is complicated by the fact 
that it can lead to the loss of MGUs; so it will likely be offered as an 
alternative to the single-sorted variant, similar to how the weighted 
path-compression is currently offered as an alternative.

[3] With the exception of fundeps which are notoriously difficult to 
implement. However, they are supported by Hugs and GHC 6.6, so I don't 
feel bad about requiring them. Once the API stabilizes a bit more I plan 
to release a unification-tf package which uses type families instead, 
for those who feel type families are easier to implement or use. There 
have been a couple requests for unification-tf, so I've bumped it up on 
my todo list.

--------------------------------------------
-- References
--------------------------------------------

Luca Cardelli (1987) /Basic polymorphic typechecking/.
     Science of Computer Programming, 8(2):147--172.

Atze Dijkstra, Arie Middelkoop, S. Doaitse Swierstra (2008)
     /Efficient Functional Unification and Substitution/,
     Technical Report UU-CS-2008-027, Utrecht University.
     <http://www.cs.uu.nl/research/techreps/repo/CS-2008/2008-027.pdf>

Simon Peyton Jones, Dimitrios Vytiniotis, Stephanie Weirich, Mark
     Shields /Practical type inference for arbitrary-rank types/,
     to appear in the Journal of Functional Programming.
     (Draft of 31 July 2007.)

Oleg Kiselyov, Chung-chieh Shan, Daniel P. Friedman, and
     Amr Sabry (2005) /Backtracking, Interleaving, and/
     /Terminating Monad Transformers/, ICFP.
     <http://www.cs.rutgers.edu/~ccshan/logicprog/LogicT-icfp2005.pdf>

Tim Sheard (2001) /Generic Unification via Two-Level Types/
     /and Paramterized Modules/, Functional Pearl, ICFP.
     <http://web.cecs.pdx.edu/~sheard/papers/generic.ps>

Tim Sheard & Emir Pasalic (2004) /Two-Level Types and/
     /Parameterized Modules/. JFP 14(5): 547--587. This is
     an expanded version of Sheard (2001) with new examples.
     <http://web.cecs.pdx.edu/~sheard/papers/JfpPearl.ps>

Wouter Swierstra (2008) /Data types a la carte/, Functional
     Pearl. JFP 18: 423--436.
     <http://www.cs.ru.nl/~wouters/Publications/DataTypesALaCarte.pdf>

--------------------------------------------
-- Links
--------------------------------------------

Homepage:
     http://code.haskell.org/~wren/

Hackage:
     http://hackage.haskell.org/package/unification-fd

Darcs:
     http://community.haskell.org/~wren/unification-fd

Haddock (Darcs version):

http://community.haskell.org/~wren/unification-fd/dist/doc/html/unification-fd

-- 
Live well,
~wren