[Haskell-cafe] ANN: unification-fd: simple generic unification

Tue Jul 12 12:26:36 CEST 2011

--------------------------------------------
-- unification-fd 0.5.0
--------------------------------------------

The unification-fd package offers generic functions for first-order
structural unification (think Prolog programming or Hindley--Milner type
inference). I've had this laying around for a few years, so I figured I
might as well publish it.

An effort has been made to try to make this package as portable as
possible. However, because it uses the ST monad and the mtl-2 package it
can't be H98 nor H2010. However, it only uses the following common
extensions which should be well supported[1]:

    Rank2Types
    MultiParamTypeClasses
    FunctionalDependencies
    FlexibleContexts
    FlexibleInstances
    UndecidableInstances

[1] With the exception of fundeps which are notoriously difficult to
implement. However, they are supported by Hugs and GHC 6.6, so I don't
feel bad about requiring it. Once the API stabilizes a bit more I plan to
release a unification-tf package which uses type families instead, for
those who feel type families are easier to implement or use.

--------------------------------------------
-- Description
--------------------------------------------

The unification API is generic in the type of the structures being unified
and in the implementation of unification variables, following the
two-level types pearl of Sheard (2001). This style mixes well with
Swierstra (2008), though an implementation of the latter is not included
in this package.

That is, all you have to do is define the functor whose fixed-point is the
recursive type you're interested in:

    -- The non-recursive structure of terms
    data S a = ...

    -- The recursive term type
    type PureTerm = Fix S

And then provide an instance for Unifiable, where zipMatch performs one
level of equality testing for terms and returns the one-level spine filled
with pairs of subterms to be recursively checked (or Nothing if this level
doesn't match).

    class (Traversable t) => Unifiable t where
        zipMatch :: t a -> t b -> Maybe (t (a,b))

The choice of which variable implementation to use is defined by similarly
simple classes Variable and BindingMonad. We store the variable bindings
in a monad, for obvious reasons. In case it's not obvious, see Dijkstra et
al. (2008) for benchmarks demonstrating the cost of naively applying
bindings eagerly.

There are currently two implementations of variables provided: one based
on STRefs, and another based on a state monad carrying an IntMap. The
former has the benefit of O(1) access time, but the latter is plenty fast
and has the benefit of supporting backtracking. Backtracking itself is
provided by the logict package and is described in Kiselyov et al. (2005).

In addition to this modularity, unification-fd implements a number of
optimizations over the algorithm presented in Sheard (2001)--- which is
also the algorithm presented in Cardelli (1987).

* Their implementation uses path compression, which we retain. Though we
modify the compression algorithm in order to make sharing observable.

* In addition, we perform aggressive opportunistic observable sharing, a
potentially novel method of introducing even more sharing than is provided
by the monadic bindings. Basically, we make it so that we can use the
observable sharing provided by the previous optimization as much as
possible (without introducing any new variables).

* And we remove the notoriously expensive occurs-check, replacing it with
visited-sets (which detect cyclic terms more lazily and without the
asymptotic overhead of the occurs-check). A variant of unification which
retains the occurs-check is also provided, in case you really need to fail
fast for some reason.

* Finally, a highly experimental branch of the API performs *weighted*
path compression, which is asymptotically optimal. Unfortunately, the
current implementation is quite a bit uglier than the unweighted version,
and I haven't had a chance to perform benchmarks to see how the constant
factors compare. Hence moving it to an experimental branch.

I haven't had a chance to fully debug these optimizations, though they
pass some of the obvious tests. If you find any bugs, do be sure to let me
know. Also, if you happen to have a test suite or benchmark suite for
unification on hand, I'd love to get a copy.

--------------------------------------------
-- References
--------------------------------------------

Luca Cardelli (1987) /Basic polymorphic typechecking/.
    Science of Computer Programming, 8(2):147--172.

Atze Dijkstra, Arie Middelkoop, S. Doaitse Swierstra (2008)
    /Efficient Functional Unification and Substitution/,
    Technical Report UU-CS-2008-027, Utrecht University.
    <http://www.cs.uu.nl/research/techreps/repo/CS-2008/2008-027.pdf>

Oleg Kiselyov, Chung-chieh Shan, Daniel P. Friedman, and
    Amr Sabry (2005) /Backtracking, Interleaving, and/
    /Terminating Monad Transformers/, ICFP.
    <http://www.cs.rutgers.edu/~ccshan/logicprog/LogicT-icfp2005.pdf>

Tim Sheard (2001) /Generic Unification via Two-Level Types/
    /and Paramterized Modules/, Functional Pearl, ICFP.
    <http://web.cecs.pdx.edu/~sheard/papers/generic.ps>

Tim Sheard & Emir Pasalic (2004) /Two-Level Types and/
    /Parameterized Modules/. JFP 14(5): 547--587. This is
    an expanded version of Sheard (2001) with new examples.
    <http://web.cecs.pdx.edu/~sheard/papers/JfpPearl.ps>

Wouter Swierstra (2008) /Data types a la carte/, Functional
    Pearl. JFP 18: 423--436.
    <http://www.cs.ru.nl/~wouters/Publications/DataTypesALaCarte.pdf>

--------------------------------------------
-- Links
--------------------------------------------

Homepage:
    http://code.haskell.org/~wren/

Hackage:
    http://hackage.haskell.org/package/unification-fd

Darcs:
    http://community.haskell.org/~wren/unification-fd

Haddock (Darcs version):
    http://community.haskell.org/~wren/unification-fd/dist/doc/html/unification-fd

-- 
Live well,
~wren