Union Types for Haskell!?

Fergus Henderson fjh@cs.mu.oz.au
Fri, 24 Nov 2000 22:14:13 +1100


On 24-Nov-2000, Bernd Holzmüller <holzmueller@ics-ag.de> wrote:
> There is one thing I was really missing in all these projects: the
> existence of union types (i.e. the union of value sets of two existing
> data types)

Mercury has a similar type system to Haskell.
This question came up a lot during the early days of Mercury,
since many Prolog programmers were used to a programming style
that made significant use of undiscriminated union types.

What operations would you allow on union types?

There's a number of possible choices.
Here's some of them:
	(1) The only operation allowed on a union type
	    is calling a type class method of a type class
	    for which all the types in the union are instances.
	(2) Allow pattern matching on values of particular types,
	    using `case'.
	(3) Allow both (1) and (2)

If we allow (2), then some programming mistakes that previously
were type errors would instead become just inexhaustive pattern
matches for which you get a run-time error.

For example:

	foo :: Maybe a -> b -> Maybe b
	foo x y = case list of
			[] -> Nothing
			(x:xs) -> Just x
		where list = case x of
			Nothing -> 0	-- oops, meant [] instead of 0
			Just foo -> [y]
				
If union types were allowed, with (2) above, this example would be
legal, with `list' having the union type { [a], Integer }.
But `foo Nothing 42' would evaluate to bottom, since the `case list'
expression has no case with a pattern of type Integer.

So allowing union types in this way would reduce static type safety,
at least unless we also forbid inexhaustive pattern matches.

If instead you only allow (1), then probably union types would
not be suitable for expressing the kinds of things that you want
to express with them.

> Is there any reason for this restriction
> in the Haskell type system? Does this lead to losing the principal type
> property?

If you allow (2) above, there may be serious problems for
principal types.  For example, consider

	f x = case x of
		Nothing -> False
		Just _ -> True

What's the most general type for `f'?
The type `f :: Maybe a -> Bool' is less general than
e.g. `f :: Union { Maybe a, ... } -> Bool',
but you certainly don't want to infer the latter type.

So (2) is definitely out.

I guess we could use a separate `typecase' rather than `case', and
require that `typecase' always be exhaustive.  That might solve
some of the problems.

But even then, there may be some tricky problems remaining.  Union
types introduce subtyping, and type inference with subtypes and
polymorphism is tricky -- in fact undecidable, if I recall correctly.
However, there have been some pragmatic attempts to solve this problem
in practice, even though the general case may be undecidable.  I think
there was a paper at the Industrial Applications of Prolog conference
in 1996 on a system called PAN (for "Prolog Analyzer") that had a type
checker that supported undiscriminated unions and/or subtypes.  There
were some cases that it didn't handle, but the authors claimed that
this wasn't a problem since those cases didn't arise in practice.

> I find the existence of union types very attractive. Apart from enhanced
> flexibility in modelling, type error messages would possibly be more
> traceable, because different branches in if- or case-expressions would
> have the *same* relevance, rather than the first branch being
> type-checked becoming normative for all other branches.

A type checker could easily generate better error messages in that sense
simply by checking each case seperately first, and then merging the results,
complaining if the types inferred in any two branches were not unifiable.

But if you allow union types, the type checker couldn't report an error
at the point where the two branches have types that are not unifiable;
instead, it would have to infer a union type there, and the point
where the types become inconsistent is only when the union type is
later used in a context that requires one particular type.  Reporting
the error at that point of use is likely to make it harder to find the
problem, since it is further away from the place where the error occurred. 
So I think allowing union types would most likely lead to *worse*
error messages.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.