[Haskell-cafe] 'data' syntax - a suggestion

Thu Sep 27 20:27:52 EDT 2007

On 28 Sep 2007, at 10:01 am, Thomas Conway wrote:
> data Tree key val
>     = Leaf key val
>     | Node BST key val BST
>     where
>     type BST = Tree key val
>
>
> data RelaxedTree key val
>     = Leaf Bal [(key,val)]
>     | Node Bal [(key,RelaxedTree key val)]
>     where
>     data Bal = Balanced | Unbalanced

My proposal was deliberately rather limited.
My feeling was that if there is a constructor
(like Balanced, Unbalanced)
then I want it to belong to a module-scope type name.

What I'm looking for is something that provides
(1) an easily understood way of abbreviating repeated types in a
     data, type, or newtype declaration
(2) and using them *uniformly* throughout such a declaration
     (which is why GADTs don't help)
(3) to reduce the incidence of errors
(4) and clarify the programmer's intent in much the same way as
     field names do (but as a complementary,  not a rival technique)
(5) and above all, to simplify maintenance.

The thing that got me thinking about this is my continuing attempt
to write a compiler (to C) for a legacy language in Haskell.
I start out with a simple AST data type, adequate for testing
the grammar.  And then I start adding semantic information to the
nodes, and suddenly I find myself adding extra fields all over the
place.

Now there's a paper that was mentioned about a month ago in this
mailing list which basically dealt with that by splitting each type
into two:  roughly speaking a bit that expresses the recursion and
a bit that expresses the choice structure.  My feeling about that
was that while it is a much more powerful and general technique, it
isn't as easy to get your head around as a single level solution.

Here's a trivial example.

Parser-only version:

	newtype Var
	    =   Var String

	data Expr
	   = Variable Var
	   | Constant Int
	   | Unary String Expr
	   | Binary String Expr

Revised version:

	data Var env
	   = Var env String

	data Expr env
	   = Variable (Var env)
	   | Constant Int
	   | Unary String (Expr env)
	   | Binary String (Expr env) (Expr env)

Now let's do Expr using my proposal:

	data Expr
	   = Variable var
	   | Constant Int
	   | Unary String expr
	   | Binary String expr expr
	   where type var = Var
		 type expr = Expr

(obtained from the first parser-only version by lower-casing the type  
names)
becoming

*	data Expr env
	   = Variable var
	   | Constant Int
	   | Unary String expr
	   | Binary String expr expr
*	   where type var = Var env
*		 type expr = Expr env

To my mind it's clearer to see 'expr' repeated than '(Expr env)'  
repeated,
as you don't have to keep checking that the argument is the same.

I'm not wedded to this scheme.  It's the simplest thing I can think of
that will do the job.  But the Haskell spirit is, if I may say so,
seems to be to look for the simplest thing that can do the job at hand
and a whole lot more in a principled way.

What I'm looking for in a better counter-proposal is something that
makes it this easy or easier to revise and extend a type.  Perhaps
a variation on GADTs would be the way to go.  I don't know.