[Haskell-cafe] Re: Proposal: Sum type branches as extended types
(as Type!Constructor)
wren ng thornton
wren at freegeek.org
Fri Jun 4 21:13:05 EDT 2010
Gabriel Riba wrote:
> New proposal draft:
>
> Proposal: Type supplement for constructor specific uses of sum types
>
> Purpose: Avoid error clauses (runtime errors), exception control or Maybe
> types in partially defined (constructor specific) functions on sum types.
>
> As an example, with
>
> data List a = Nil | Cons a (List a)
>
> * Actual system, with runtime errors (as in GHC Data.List head) or
> exception throwing or optional Maybe results.
>
> hd :: List a -> a
> hd (Cons x _) -> x
> hd Nil -> error "error: hd: empty list" -- error or exception throwing
>
> * Proposed system extending types with a suffix @ Constructor or @
> {Constructor1, Constructor2, ..}
>
> hd :: List @ Cons a -> a
> hd (Cons x _) = x
>
> The caller must do pattern matching before applying the constructor-specific
> function.
Since the goal is to propagate case analysis information, the syntax
should reflect that. That is, there should be support for nested
patterns, e.g.,
cdar :: [a]@{_:_:_} -> a
cdar (_:x:_) = x
cadr :: [[a]]@{(_:_):_} -> a
cadr ((_:xs):_) = xs
headFromJust :: [Maybe a]@{Just _ : _} -> a
...
The type at cons syntax is a nice shorthand, but we need to express the
arguments to the data constructors in the extended syntax in order to
support nested patterns.
For delimiting multiple alternatives, it's not clear that comma is the
best delimiter to use, especially since it could be the data constructor
for tuples. Perhaps using ; or | would be better. Unless there's a
syntactic reason for preferring braces over parentheses, perhaps we
should just use parentheses for symmetry with as-patterns.
Finally, there should also be support for negative patterns, i.e.,
propagation of the failure to match a pattern. One place this is useful
is for distinguishing 0 from other numbers, which allows removing the
error branches from functions like division. Sometimes we can't
enumerate all the positive patterns we want to allow, but it's easy to
express what should be disallowed.
To match case analysis we should allow for a conjunction of negative
patterns followed by a positive pattern. Or, if we want to incorporate
multiple positive patterns, then a conjunction of negative patterns
followed by a disjunction of positive patterns. (Disjunctive case
matching has been an independent proposal in the past, and there's
nothing prohibiting supporting it.)
Thus, if we use | to delimit disjunctions, & to delimit conjunctions,
and \\ to separate the disjuncts from the conjuncts, given the following
case analysis:
case x : T of
p1 y1... -> e1
p2 y2... -> e2
_ -> eF
The variable x has type T outside of the case expression. Within the
branch e1 it is given the refinement type T@{p1 _...} where variables
bound by the pattern are replaced with wildcards. In branch e2 it is
given the refinement type T@{p2 _... \\ p1 _...}. This can be simplified
to T@{p2 _...} if the head constructors of p2 and p1 are distinct. And
in the eF branch x would be given the refinement type T@{_ \\ p1 _... &
p2_...}.
If this semantics is too hard to implement, we could instead require the
use of as-patterns for introducing the refinements. The variable
introduced by @ in the as-pattern would be given the refinement type,
but the scrutinee would continue to have the unrefined type. This latter
semantics is common in dependently typed languages, but it's verbose and
ugly so it'd be nice to avoid it if we can.
Other notes:
Case matching on non-variable expressions would gain no extra support,
since we have no variable to associate the refinement information with
(unless as-patterns are used).
A refinement type T@{p1} can always be weakened to T@{p1 | p2}.
Similarly, a refinement type can always be weakened by erasing it.
For type inference, I'd suggest that functions which do not have rigid
signatures are treated the way they currently are; that is, all
refinement information is weakened away unless it is explicitly
requested or returned by a function's type signature. This could be
improved upon later, but seems like the most reasonable place to start.
One complication of trying to infer refinement types is that if we are
too liberal then we won't catch bugs arising from non-exhaustive pattern
matching.
Syntax-wise, there's no particular reason for distinguishing difference
from conjunctions under difference. That is, the type T@{... \\ p1 & p2}
could just as well be written T@{... \\ p1 \\ p2}. And there's no need
for conjunctions under disjunction because we can unify the patterns to
get their intersection. Thus, it might be best to just have disjunction
and difference for simplicity.
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list