[Haskell-cafe] Re: Proposal: Sum type branches as extended types (as Type!Constructor)

Fri Jun 4 21:13:05 EDT 2010

Gabriel Riba wrote:
> New proposal draft:
> 
> Proposal: Type supplement for constructor specific uses of sum types
> 
> Purpose: Avoid error clauses (runtime errors), exception control or Maybe
>  types in partially defined (constructor specific) functions on sum types.
> 
> As an example, with
> 
>    data List a = Nil | Cons a (List a)  
> 
> * Actual system, with runtime errors (as in GHC Data.List head) or 
> exception throwing or optional Maybe results.
> 
>    hd :: List a -> a
>    hd (Cons x _) -> x
>    hd Nil -> error "error: hd: empty list" -- error or exception throwing
> 
> * Proposed system extending types with a suffix @ Constructor or @
> {Constructor1, Constructor2, ..}
> 
>    hd :: List @ Cons a -> a
>    hd (Cons x _) = x
> 
> The caller must do pattern matching before applying the constructor-specific
> function.

Since the goal is to propagate case analysis information, the syntax 
should reflect that. That is, there should be support for nested 
patterns, e.g.,

     cdar :: [a]@{_:_:_} -> a
     cdar (_:x:_) = x

     cadr :: [[a]]@{(_:_):_} -> a
     cadr ((_:xs):_) = xs

     headFromJust :: [Maybe a]@{Just _ : _} -> a
     ...

The type at cons syntax is a nice shorthand, but we need to express the 
arguments to the data constructors in the extended syntax in order to 
support nested patterns.

For delimiting multiple alternatives, it's not clear that comma is the 
best delimiter to use, especially since it could be the data constructor 
for tuples. Perhaps using ; or | would be better. Unless there's a 
syntactic reason for preferring braces over parentheses, perhaps we 
should just use parentheses for symmetry with as-patterns.

Finally, there should also be support for negative patterns, i.e., 
propagation of the failure to match a pattern. One place this is useful 
is for distinguishing 0 from other numbers, which allows removing the 
error branches from functions like division. Sometimes we can't 
enumerate all the positive patterns we want to allow, but it's easy to 
express what should be disallowed.

To match case analysis we should allow for a conjunction of negative 
patterns followed by a positive pattern. Or, if we want to incorporate 
multiple positive patterns, then a conjunction of negative patterns 
followed by a disjunction of positive patterns. (Disjunctive case 
matching has been an independent proposal in the past, and there's 
nothing prohibiting supporting it.)

Thus, if we use | to delimit disjunctions, & to delimit conjunctions, 
and \\ to separate the disjuncts from the conjuncts, given the following 
case analysis:

     case x : T of
     p1 y1... -> e1
     p2 y2... -> e2
     _        -> eF

The variable x has type T outside of the case expression. Within the 
branch e1 it is given the refinement type T@{p1 _...} where variables 
bound by the pattern are replaced with wildcards. In branch e2 it is 
given the refinement type T@{p2 _... \\ p1 _...}. This can be simplified 
to T@{p2 _...} if the head constructors of p2 and p1 are distinct. And 
in the eF branch x would be given the refinement type T@{_ \\ p1 _... & 
p2_...}.

If this semantics is too hard to implement, we could instead require the 
use of as-patterns for introducing the refinements. The variable 
introduced by @ in the as-pattern would be given the refinement type, 
but the scrutinee would continue to have the unrefined type. This latter 
semantics is common in dependently typed languages, but it's verbose and 
ugly so it'd be nice to avoid it if we can.

Other notes:

Case matching on non-variable expressions would gain no extra support, 
since we have no variable to associate the refinement information with 
(unless as-patterns are used).

A refinement type T@{p1} can always be weakened to T@{p1 | p2}. 
Similarly, a refinement type can always be weakened by erasing it.

For type inference, I'd suggest that functions which do not have rigid 
signatures are treated the way they currently are; that is, all 
refinement information is weakened away unless it is explicitly 
requested or returned by a function's type signature. This could be 
improved upon later, but seems like the most reasonable place to start. 
One complication of trying to infer refinement types is that if we are 
too liberal then we won't catch bugs arising from non-exhaustive pattern 
matching.

Syntax-wise, there's no particular reason for distinguishing difference 
from conjunctions under difference. That is, the type T@{... \\ p1 & p2} 
could just as well be written T@{... \\ p1 \\ p2}. And there's no need 
for conjunctions under disjunction because we can unify the patterns to 
get their intersection. Thus, it might be best to just have disjunction 
and difference for simplicity.

-- 
Live well,
~wren