[Haskell-cafe] Re: [Haskell] [Fwd: undecidable & overlapping instances: a bug?]

Martin Sulzmann sulzmann at comp.nus.edu.sg
Mon Oct 22 01:09:23 EDT 2007

Mark P Jones writes:
 > Hi All,
 > Here are my responses to the recent messages, starting with some
 > summary comments:
 > - I agree with Martin that the condition I posted a few days ago is
 >    equivalent to the "*refined* weak coverage condition" in your
 >    paper.  The "refined" tag here is important---I missed it the
 >    first time and thought he was referring to the WCC at the start of
 >    Section 6.1, and I would strongly recommend against using that
 >    (unrefined) version.  I confess that I haven't read the paper
 >    carefully enough to understand why you included that first WCC at
 >    all; I don't see any reason why you would want to use that
 >    particular condition in a practical implementation, and I don't
 >    see any advantages to it from a theoretical perspective either.
 >    (Maybe you just used it as a stepping stone to help motivate the
 >    refined WCC?)

It's a stepping stone but WCC on it's own is useful to recover
termination (see the "zip" example in the paper).

 > - I believe that the refined WCC (or the CC or even the original WCC
 >    if you really want) are sufficient to ensure soundness.  I don't
 >    agree that the (B) restriction to "full" FDs is necessary.
 > - I think it is dangerous to tie up soundness with questions about
 >    termination.  The former is a semantic property while the latter
 >    has to do with computability.  Or, from a different perspective,
 >    soundness is essential, while termination/decidability is "only"
 >    desirable.  I know that others have thought more carefully than
 >    I have about conditions on the form of class and instance
 >    declarations that will guarantee decidability and I don't have
 >    much to say on that topic in what follows.  However, I will
 >    suggest that one should start by ensuring soundness and then,
 >    *separately*, consider termination.  (Of course, it's perfectly
 >    ok---just not essential---if conditions used to ensure soundness
 >    can also be used to guarantee termination.)

Well, decidability and completeness matters if we care about automatic
type inference.

Given some input program we'd like that to have a *decidable* algorithm
which computes a type for every well-typed program (*soundness*) and
yields failure if the program is ill-typed (*completeness*).

In "Simplifying and Improving Qualified Types." you give indeed
a sound, decidable and complete type inference algorithm
*if* the proof theory among constraints in some theory of qualified
types is sound, decidable and complete.

My main concern is to establish sufficient conditions such that the 
proof theory is sound, decidable and complete. That is, there's a
decidable algorithm which says yes if QT |= C1 -> C2 and otherwise no
where QT |= C1 -> C2 means that in the qualified theory QT constraint
C1 entails constraint C2.

I think that our works are fairly complementary. Or are you saying
you've already given such an algorithm (which I indeed must have

I believe that you are offended by my statement that 

 > Confluence is vital to guarantee completeness of type inference.

Let me clarify. Confluence is one of the suffient conditions to
guarantee completeness of CHR-based type inference. Apologies for any
confusion caused by my previous statement.

There are clearly other approaches possible to obtain a sound,
decidable and complete proof theory (and therefore to obtain sound,
decidable and complete type inference). But I yet need to see concrete
results which match the CHR-based approach/results described in:

Understanding Functional Dependencies via Constraint Handling Rules
Martin Sulzmann, Gregory J. Duck, Simon Peyton-Jones and Peter
J. Stuckey
In Journal of Functional Programming, 2007


 > Stop reading now unless you're really interested in the details!
 > I'm going to start by explaining some key (but hopefully
 > unsurprising) ideas before I respond in detail to the example
 > that Martin posted.
 > Interpreting Class, Instance, and Dependency Decls:
 > ---------------------------------------------------
 > I'll start by observing that Haskell's class, instance, and
 > functional dependency declarations all map directly to formulas in
 > standard predicate logic.  The correspondence is straightforward, so
 > I'll just illustrate it with some examples:
 >    Declaration               Formula
 >    -----------               -------
 >    class Eq a => Ord a       forall a. Ord a => Eq a
 >    instance Eq a => Eq [a]   forall a. Eq a => Eq [a]
 >    class C a b | a -> b      forall a, b. C a b /\ C a c => b = c
 > This correspondence between declarations and formulas seems to be
 > very similar to what you did in the paper using CHRs.  I haven't
 > read the paper carefully enough to know what extra mileage and/or
 > problems you get by using CHRs.  To me, predicate logic seems more
 > natural, but I don't think that matters here---I just wanted to
 > acknowledge that essentially the same idea is in your paper.
 > In the following, assuming some given program P, I'll write I(P)
 > for the conjunction of all the instance declaration formulas in
 > P; C(P) for the conjunction of all the class declaration formulas;
 > and F(P) fo the conjunction of all the dependency declaration
 > formulas.
 > Semantics and Soundness:
 > ------------------------
 > We'll adopt a simple and standard semantics in which each
 > n-parameter type class is described by an n-place relation, that is,
 > a set of n-tuples of types.  Of course, in the most common, one
 > parameter case, this is equivalent to viewing the class as a set of
 > types.  Each tuple in this relation is referred to as an "instance"
 > of the class, and the predicate syntax C t1 ... tn is just a
 > notation for asserting that the tuple (t1, ..., tn) is an instance
 > of C.
 > We can assign a meaning to each of the classes in a given program P
 > by using the smallest model of the instance declaration formulas,
 > I(P).  (Existence of such a model is guaranteed; it is the union of
 > an increasing chain of approximations.)
 > What about the superclass properties in C(P)?  We expect to use
 > information from C(P) to simplify contexts during type inference.
 > For example, if we're using the standard Haskell prelude, then we
 > know that we can simplify (Eq a, Ord a) to (Ord a).  But how do we
 > know this is sound?  Haskell answers this by imposing restrictions
 > (third bullet in Section 4.3.2 of the report) on the use of
 > instance declarations to ensure that our (least) model of I(P) is
 > also a model of C(P).  This guarantees, for example, that every
 > type in the semantics of "Ord" will, as required, also be included
 > in the semantics of "Eq".
 > That should all seem pretty standard, but I've included it here
 > to make the point that we need to something very similar when we
 > introduce functional dependencies.  Specifically, we want to be
 > able to use properties from F(P) to simplify/improve contexts
 > during type inference, so we have to ensure that this is sound.
 > If we follow the strategy that was used to ensure soundness of
 > C(P) in Haskell, then we have to come up with a restriction on
 > instance declarations to ensure that our model of I(P) is also
 > a model of F(P).  I claim that any of the three conditions we've
 > talked about previously---using your terminology, the CC, WCC or
 > refined WCC---are sufficient to ensure this soundness property.
 > I haven't completed a formal proof, but I think it should be a
 > pretty straightforward exercise in predicate logic.
 > If a program fails to meet the criteria that ensure soundness of a
 > superclass declaration, the Haskell compiler should reject that
 > code.  Exactly the same thing should happen if the input program
 > doesn't satisfy the functional dependencies.  Further restrictions
 > are needed to ensure that there is a well-defined semantics (i.e.,
 > dictionary/evidence) for each type class instance; this is where
 > overlapping instances complicate the picture, for example.
 > However, for what we're doing here, it's sufficient to focus on
 > the semantics of classes (i.e., on which types are instances of
 > which classes) and to defer the semantics of overloaded operators
 > to a separate discussion.
 > Responding to Martin's Example:
 > -------------------------------
 >  > Here's an example (taken from the paper).
 >  >
 >  >   class F a b c | a->b
 >  >   instance F Int Bool Char
 >  >   instance F a b Bool => F [a] [b] Bool
 > This is an unfortunate example because, in fact, the second instance
 > declaration doesn't contribute anything to the class F!  Using the
 > semantics I outlined above, the class F is just {(Int, Bool, Char)},
 > with only the one instance that comes from the first instance
 > declaration.  This is unfortunate because, in the presence of
 > functional dependencies---or, in general, any form of
 > "improvement"---you have to take account of satisfiability.  In the
 > example here, any predicate of the form F a b Bool is
 > unsatisfiable, and so any time we attempt to transform an inferred
 > context containing a predicate of this form into some "simplified"
 > version, then there is a sense in which all we are doing is proving
 > that False is equivalent to False (or that anything follows from a
 > contradiction).
 > In what follows, let's refer to the program above as Q1.  However,
 > to explore a case where the predicates of interest are satisfiable,
 > let's also consider an extended version, Q2, that adds one extra
 > instance declaration:
 >      instance F Char Int Bool
 > Now the semantics of Q2 looks something like:
 >   { (Int, Bool, Char),
 >     (Char, Int, Bool),
 >     ([Char], [Int], Bool),
 >     ([[Char]], [[Int]], Bool),
 >     ([[[Char]]], [[[Int]]], Bool),
 >     ([[[[Char]]]], [[[[Int]]]], Bool), ... }
 >  > The FD is not full because the class parameter c is not involved in
 >  > the FD. We will show now that the CHR solver is not confluent.
 > Although it's just one example, it is worth pointing out that both
 > Q1 and Q2 lead to a semantics for F that is consistent/sound with
 > respect to the declared functional dependency.  So far as our
 > semantics is concerned, the fact that the dependency is not full
 > is not significant.
 > I don't dispute your claim that fullness is necessary to establish
 > confluence with CHRs.  But that is a consequence of the formal
 > proof/rewrite system that you're using, and not of the underlying
 > semantics.
 >  > Here is the translation to CHRs (see the paper for details)
 >  >
 >  >   rule F a b1 c, F a b2 d  ==> b1=b2      -- (FD)
 >  >   rule F Int Bool Char    <==> True       -- (Inst1)
 >  >   rule F Int a b           ==> a=Bool     -- (Imp1)
 >  >   rule F [a] [b] Bool     <==> F a b Bool -- (Inst2)
 >  >   rule F [a] c d           ==> c=[b]      -- (Imp2)
 > I'm a little surprised that you need Imp1 and Imp2 here.  I didn't
 > include analogs of these when I gave a predicate logic rendering
 > for dependency declarations above because they can be obtained as
 > derived properties.  Imp1, for example:
 >      Suppose F Int a b.
 >      From Inst1, we know that F Int Bool Char.
 >      Hence by FD, we know that a = Bool.
 > My best guess is that you included the Imp1 and Imp2 rules because,
 > although the "<==>" symbol suggests the possibility of bidirectional
 > rewriting, you actually want to view the above as oriented rewrite
 > rules?  From my perspective, however, the FD rule is the definition
 > of the functional dependency on F, and everything that you want to
 > know about this dependency can be derived from it ...
 >  > The above CHRs are not confluent. For example,
 >  > there are two possible CHR derivations for the initial
 >  > constraint store F [a] [b] Bool, F [a] b2 d
 >  >
 >  >     F [a] [b] Bool, F [a] b2 d
 >  > -->_FD (means apply the FD rule)
 >  >     F [a] [b] Bool, F [a] [b] d , b2=[b]
 >  > --> Inst2
 >  >     F a b Bool, F [a] [b] d , b_2=[b]         (*)
 >  >
 >  > Here's the second CHR derivation
 >  >
 >  >     F [a] [b] Bool, F [a] b2 d
 >  > -->_Inst2
 >  >     F a b Bool, F [a] b2 d
 >  > -->_Imp2
 >  >     F a b Bool, F [a] [c] d, b2=[c]           (**)
 >  >
 >  > (*) and (**) are final stores (ie no further CHR are applicable).
 >  > Unfortunately, they are not logically equivalent (one store says
 >  > b2=[b] whereas the other store says b2=[c]).
 > I think it's a mistake to expect these formulas to be logically
 > equivalent, at least in the strong syntactic sense that you seem
 > to be assuming here.  As I've mentioned previously, if we're going
 > to use improvement, then we have to take account of satisfiability.
 > In the context of Q1, this is an example in which each of the two
 > derivations above are essentially equivalent to False --> False.
 > But, in the general case, if we comparing (*) with (**), then the
 > main difference seems to be that the latter involves the use of
 > a "new" variable, c, in place of the variable b that appears in
 > the latter.  However, (*) includes F [a] [b] d and (**) includes
 > F [a] [c] d, and so we can deduce from the functional dependency
 > (i.e., your FD rule) that [b]=[c], and hence b=c.  If we're
 > allowed to inline that equality (i.e., use it to replace all
 > occurences of c with b), then (**) becomes identical to (*).
 >  > To conclude, fullness is a necessary condition to establish confluence
 >  > of the CHR solver.
 > I think that the standard notion of confluence is the wrong
 > property to be looking at here.  Or, to put it another way, I don't
 > think it is reasonable to expect confluence in the presence of
 > improvement.  That said, it might still be appropriate and useful
 > to formulate a notion of "confluence modulo satisfiability"
 > in which the equalities involved in the definition of confluence
 > are weakened to "equalities modulo satisfiability" ...
 >  > Confluence is vital to guarantee completeness of type inference.
 > I disagree.  I established the completeness of type inference in
 > the presence of improvement (without requiring any kind of
 > confluence-like property) as part of my work in 1994 on
 > "Simplifying and Improving Qualified Types."  (See Section 4 of
 > the technical report version, for example, which includes
 > details---like the proofs---that I had to omit from the later
 > FPCA '95 version for lack of space.)  Functional dependencies
 > are a special case of improvement, so I'd dispute the claim in
 > your paper that you are the first to formalize and prove
 > soundness, completeness and decidability.  (However, I'm
 > sympathetic to the fact that you missed this, because my results
 > on improvement were published six years before my formulation of
 > functional dependencies; if you were only looking at later
 > publications, you wouldn't have found anything.)
 > It's worth pointing out that, to achieve my completeness result,
 > I had to do exactly what I am suggesting that you should do now:
 > modify your formalism to account for satisfiability.  In my work,
 > for example, that meant introducing a "satisfiability ordering"
 > on type schemes and replacing the conventional notion of principal
 > types with a concept of "principal satisfiable types".
 > Let me take this opportunity to reflect on my understanding of
 > your main contributions and the relationship to my work.  In my
 > general, formal system, simplification and improvement rules can
 > be used at any point during type inference, but there are no
 > language features---such as explicit type signatures or instance
 > declarations---that *require* the use of such rules.  As a result,
 > decidability, in the general case, is easy to establish, but it
 > is arguably not very useful.
 > In my paper, I suggested that, in practice, for any specific
 > application of improvement, a language designer would likely want
 > to adopt a more algorithmic approach that invokes suitably defined
 > "improvement" and "simplification" functions at specific points
 > during type inference (coupled, for example, with
 > generalization).  With this approach, soundness, completeness, and
 > decidability of type inference become dependent on the soundness
 > and termination properties of the improvement and simplification
 > functions.  This, I think, is the most significant technical issue
 > that you are addressing in your paper.  In concrete terms, I think
 > this comes down to a question of whether the entailment relation
 > C1 ||- C2 between contexts C1 and C2 is decidable.  I hadn't heard
 > of the "Paterson Conditions" before I saw the reference on the
 > Trac page (sorry Ross!), but it sounds like they are sufficient to
 > guarantee good theoretical properties but also clear and flexible
 > enough to be useful in practice.
 >  > I don't think that fullness is an onerous condition.
 > Well that's probably true, but requiring fullness means that
 > there's one extra concept that programmers have to grapple with,
 > and one extra hurdle that they have to overcome in those
 > situations where a non-full FD seems to provide a more natural
 > solution.
 > There are at least a couple of ways to avoid the non-full FD in
 > your example program.  For example, we could replace every use
 > of (F a b c) with a pair of constraints (F1 a b, F2 a c) where
 > the classes F1 and F2 are as follows:
 >    class F1 a b | a -> b
 >    class F2 a c
 > This is an example of what's called normalization in relational
 > databases.  Because the type class relations that we use in
 > practice are fairly simple (just a few columns) and because
 > normalized relations have some structural benefits, I'd suspect
 > that the use of normalized type class relations, like the F1+F2
 > pair above, is usually a hallmark of better design than the
 > non-normalized version (like the original version with just F).
 > Nevertheless, unless there are strong technical reasons to require
 > one version over the other, I'd prefer to leave choices like this
 > open so that individual programmers can make the most appropriate
 > call for a given application.
 > ....
 > I'm sorry I haven't given you this kind of feedback previously.
 > Simon had invited me to take a look at the paper on at least one
 > previous occassion, but it had not found its way to the front of my
 > reading queue.  Even now, I've only skimmed the paper (e.g., I
 > started with the Trac web page, figured out that I needed to know
 > what a "full" FD might be, and then used a text search of the pdf
 > to locate the definition.)  But I can see that there is a lot more
 > in the paper beyond the aspects we're discussing here, particularly
 > with respect to criteria for establishing termination, so I
 > apologize again if I've overlooked other relevant sections.
 > I hope my comments are still useful!
 > All the best,
 > Mark

More information about the Haskell-Cafe mailing list