ANN: unification-fd 0.8.0
wren ng thornton
wren at freegeek.org
Wed Jul 11 02:25:33 CEST 2012
--------------------------------------------
-- unification-fd 0.8.0
--------------------------------------------
The unification-fd package offers generic functions for single-sorted
first-order structural unification (think of programming in Prolog, or
of the metavariables in type inference)[1][2]. The library *is*
sufficient for implementing higher-rank type systems a la [Peyton Jones,
Vytiniotis, Weirich, Shields], but bear in mind that unification
variables are the metavariables of type inference--- not the
type-variables. As of this version, the library is also sufficient for
implementing (non-recursive) feature structure unification.
An effort has been made to make the package as portable as possible.
However, because it uses the ST monad and the mtl-2 package it can't be
H98 nor H2010. However, it only uses the following common extensions
which should be well supported[3]:
Rank2Types
MultiParamTypeClasses
FunctionalDependencies -- Alas, necessary for type inference
FlexibleContexts -- Necessary for practical use of MPTCs
FlexibleInstances -- Necessary for practical use of MPTCs
UndecidableInstances -- For Show instances due to two-level types
--------------------------------------------
-- Changes (since 0.7.0)
--------------------------------------------
This release is another API breaking release, though hopefully minor. In
particular, the type of the zipMatch method for Unifiable has changed from:
zipMatch :: t a -> t b -> Maybe (t (a, b))
to:
zipMatch :: t a -> t a -> Maybe (t (Either a (a, a)))
With the new type each of the unification subproblems can be declared as
fully resolved (Left xy) or as still pending solution (Right (x,y))---
whereas the previous type only allowed the latter. This extension is
necessary for implementing unification of feature structures (e.g., t ~
Map k). Given two feature structures which do not fully overlap, we can
now declare that the non-intersecting parts have been unified and only
require further unification for the parts which intersect--- whereas
previously we would've been forced to declare the non-intersecting parts
as requiring unification with themselves, which will always succeed and
introduces extra work.
This feature is one I had available in earlier unpublished versions of
the code, but errantly removed it when simplifying things for publishing
on Hackage. To fix extant Unifiable instances, just wrap each tuple with
Right. For those who care about such things, I have not been able to
discern any noticeable difference in performance between the two
versions. If you can provide benchmarks demonstrating otherwise, I'd be
pleased to look at them.
In addition, the Show instance for Fix has been adjusted to use
showsPrec in lieu of show, correcting some infelicities with the output.
--------------------------------------------
-- Description
--------------------------------------------
The unification API is generic in the type of the structures being
unified and in the implementation of unification variables, following
the two-level types pearl of Sheard (2001). This style mixes well with
Swierstra (2008), though an implementation of the latter is not included
in this package.
That is, all you have to do is define the functor whose fixed-point is
the recursive type you're interested in:
-- The non-recursive structure of terms
data S a = ...
-- The recursive term type
type Term = Fix S
And then provide an instance for Unifiable, where zipMatch performs one
level of equality testing for terms and returns the one-level spine
filled with pairs of subterms to be recursively checked, or Nothing if
this level doesn't match. Each subterm can be separately marked as being
resolved, Left xy, or as requiring further unification, Right(x,y).
class (Traversable t) => Unifiable t where
zipMatch :: t a -> t a -> Maybe (t (Either a (a, a)))
The choice of which variable implementation to use is defined by
similarly simple classes Variable and BindingMonad. We store the
variable bindings in a monad, for obvious reasons. In case it's not
obvious, see Dijkstra et al. (2008) for benchmarks demonstrating the
cost of naively applying bindings eagerly.
There are currently two implementations of variables provided: one based
on STRefs, and another based on a state monad carrying an IntMap. The
former has the benefit of O(1) access time, but the latter is plenty
fast and has the benefit of supporting backtracking. Backtracking itself
is provided by the logict package and is described in Kiselyov et al.
(2005).
In addition to this modularity, unification-fd implements a number of
optimizations over the algorithm presented in Sheard (2001)--- which is
also the algorithm presented in Cardelli (1987).
* Their implementation uses path compression, which we retain. Though we
modify the compression algorithm in order to make sharing observable.
* In addition, we perform aggressive opportunistic observable sharing, a
potentially novel method of introducing even more sharing than is
provided by the monadic bindings. Basically, we make it so that we can
use the observable sharing provided by the modified path compression as
much as possible (without introducing any new variables).
* And we remove the notoriously expensive occurs-check, replacing it
with visited-sets (which detect cyclic terms more lazily and without the
asymptotic overhead of the occurs-check). A variant of unification which
retains the occurs-check is also provided, in case you really need to
fail fast.
* Finally, a highly experimental branch of the API performs *weighted*
path compression, which is asymptotically optimal. Unfortunately, the
current implementation is quite a bit uglier than the unweighted
version, and I haven't had a chance to perform benchmarks to see how the
constant factors compare. Hence moving it to an experimental branch.
These optimizations pass a test suite for detecting obvious errors. If
you find any bugs, do be sure to let me know. Also, if you happen to
have a test suite or benchmark suite for unification on hand, I'd love
to get a copy.
--------------------------------------------
-- Notes and limitations
--------------------------------------------
[1] At present the library does not appear amenable for implementing
higher-rank unification itself; i.e., for higher-ranked metavariables,
or higher-ranked logic programming. To be fully general we'd have to
abstract over which structural positions are co/contravariant, whether
the unification variables should be predicative or impredicative, as
well as the isomorphisms of moving quantifiers around. It's on my todo
list, but it's certainly non-trivial. If you have any suggestions, feel
free to contact me.
[2] At present it is only suitable for single-sorted (aka untyped)
unification, a la Prolog. In the future I aim to support multi-sorted
(aka typed) unification, however doing so is complicated by the fact
that it can lead to the loss of MGUs; so it will likely be offered as an
alternative to the single-sorted variant, similar to how the weighted
path-compression is currently offered as an alternative.
[3] With the exception of fundeps which are notoriously difficult to
implement. However, they are supported by Hugs and GHC 6.6, so I don't
feel bad about requiring them. Once the API stabilizes a bit more I plan
to release a unification-tf package which uses type families instead,
for those who feel type families are easier to implement or use. There
have been a couple requests for unification-tf, so I've bumped it up on
my todo list.
--------------------------------------------
-- References
--------------------------------------------
Luca Cardelli (1987) /Basic polymorphic typechecking/.
Science of Computer Programming, 8(2):147--172.
Atze Dijkstra, Arie Middelkoop, S. Doaitse Swierstra (2008)
/Efficient Functional Unification and Substitution/,
Technical Report UU-CS-2008-027, Utrecht University.
<http://www.cs.uu.nl/research/techreps/repo/CS-2008/2008-027.pdf>
Simon Peyton Jones, Dimitrios Vytiniotis, Stephanie Weirich, Mark
Shields /Practical type inference for arbitrary-rank types/,
to appear in the Journal of Functional Programming.
(Draft of 31 July 2007.)
Oleg Kiselyov, Chung-chieh Shan, Daniel P. Friedman, and
Amr Sabry (2005) /Backtracking, Interleaving, and/
/Terminating Monad Transformers/, ICFP.
<http://www.cs.rutgers.edu/~ccshan/logicprog/LogicT-icfp2005.pdf>
Tim Sheard (2001) /Generic Unification via Two-Level Types/
/and Paramterized Modules/, Functional Pearl, ICFP.
<http://web.cecs.pdx.edu/~sheard/papers/generic.ps>
Tim Sheard & Emir Pasalic (2004) /Two-Level Types and/
/Parameterized Modules/. JFP 14(5): 547--587. This is
an expanded version of Sheard (2001) with new examples.
<http://web.cecs.pdx.edu/~sheard/papers/JfpPearl.ps>
Wouter Swierstra (2008) /Data types a la carte/, Functional
Pearl. JFP 18: 423--436.
<http://www.cs.ru.nl/~wouters/Publications/DataTypesALaCarte.pdf>
--------------------------------------------
-- Links
--------------------------------------------
Homepage:
http://code.haskell.org/~wren/
Hackage:
http://hackage.haskell.org/package/unification-fd
Darcs:
http://community.haskell.org/~wren/unification-fd
Haddock (Darcs version):
http://community.haskell.org/~wren/unification-fd/dist/doc/html/unification-fd
--
Live well,
~wren
More information about the Libraries
mailing list