[Hugs-users] generic operations on Trex records [was: [haskell-beginners] Lifting over record syntax]

Sun Nov 4 11:55:22 UTC 2018

On Thu, 1 Nov 2018 at 12:31 AM, Anthony Clayden  wrote:

> Label names in Trex are literals; there's no such thing as a label
> variable. (Which is why it's a tad annoying that they start lower case.)
> Furthermore the same label name must appear in both terms and types -- in
> fact labels occupy a namespace separate vs terms or types. So Trex is a
> long way from from generic record handling like:
>
> recAppend :: ( rho'\__x ) => Rec rho' -> Rec ( __x :: a) -> Rec ( __x :: a
> | rho')
> recAppend rho ( __x = x ) = ( __x = x | rho )
>
> in which I've used double-underscore prefix to signify a label variable.
> This is intended to extend a record `rho` with a singleton record. If we
> try appending a record with more than one field, beware that field order is
> arbitrary, so this
>
> recAppend rho ( __x = x, __y = y) = ...
>
> has no principal type (a familiar difficulty). The programmer doesn't care
> which way round labels `__x, __y` bind, providing they're distinct, but the
> typing does care.
>

I've just come across Gaster's 1997 paper 'Polymorphic Extensible Records
for Haskell', which gives a bit more detail on the internals of the
implementation. No surprise: records are stored in a continuous block of
memory, with fields left-to-right by alphabetic order of label. Then the
lacks constraint for each label name is merely an index into the vector,
and gets type-erased. So we could de-arbitrate field selection by taking
them left-to-right. (And introduce the ordering on labels into the formal
typing.)

I see in the Gaster+Jones 1996 paper, there is a possible generic approach:
section 6.2 'First-class labels'. "This increases the expressiveness of our
system quite dramatically, ..." There's to be a distinct kind for labels,
and a type constructor Label :: label -> *. Presumably in 2018 we'd use
kind Symbol for labels. So label `name` is a type variable(?) set to `Label
"name"`. The paper doesn't seem to suggest any of the approach has been
developed, nor can I see evidence of it inside the Hugs code. Specifically
in the code, the parser puts all labels straight into a dedicated symbol
table, where they're treated as constants. (They can only appear in very
clearly identifiable syntactic contexts.) I'm not as sanguine as section
6.2: how to treat label variables as opposed to constants?

* How to treat pattern-matching on two distinct label variables? (example
above)
* How to treat pattern-matching on one label variable appearing in distinct
patterns? Such as

recProject (__x = x | rho1) (__x = _ | rho2) = (__x = x | (recProject rho1
rho2) )

This is currently a valid term, BTW, but is looking for label constant
`__x` in both records.

This is trying to say: if the two records have some label in common (I
don't care which), then take that label and its field; recurse on the
leftovers of the two records. So this will project the LH record on labels
in common with RH record. It's a great design pattern. But worse than the
`recAppend` above, the compiler now has to pick arbitrarily a label that
appears in both arguments; not arbitrarily left-to-right alphabetic
ordering in just one record. (And if that's not possible, don't select that
instance; there's presumably a base case instance for no labels in common.)

It's all fine and dandy using label variables in the formal semantics. It's
quite different supporting them in surface syntax and typing.

> I'd like to write term `( rho1 | rho2 )` to concatenate two records.
> That's currently unrecognised syntax, so I think could be added as such.
> What would be its type/is it principal?
>
> ( rho1 | rho2 ) :: (rho1' \\ rho2') => Rec( rho1' | rho2' )    --
> inventing more syntax
>
> in which constraint `(rho1' \\ rho2')` requires the two rows' labels be
> mutually disjoint -- read "lacks all". Ur/web has something like this.
>

Also Harper&Pierce 1990, 1991 referenced in the G+J paper. We can define
"lacks all" in terms of lacks, by extending the G+J Figure 1 definition of
`\` thusly

* base case: two EmptyRecs lack all labels in common
* recursive case: given two recs that lack all plus some label that both
lack, adding that label at some type to one record, they still both lack
all.

Symbolically:

P ||- {| |} \\ {| |} ;

P ||- r1 \\ r2, r1\l, r2\l
---------------------
P ||- {| l :: tau | r1 |} \\ r2

Note I'm not envisaging `|` as a genuine operator: this is still hard-wired
> syntax; pipe is a reserved symbol in H98 anyway.
>
>
Specifying its formal semantics does need distinguishing one of the
arguments being empty vs not (I'll give the term-level definition as if it
were possible, these need to be in distinct Instances):

(EmptyRec | rho2) = rho2
( (__x = x | rho1') | rho2) = (__x = x | (rho1' | rho2) )

Then to achieve full expressivity for a record system, I need just one more
operation: label subtraction or "project away" also known in relational
algebra as "remove" or "project ALL BUT", sometimes symbolised as pi-hat or
pi-overbar. Semantics: given two records with possibly some labels in
common, return the LH argument, with all in-common labels/fields removed.
Note this cunningly avoids requiring the in-common labels to have same
field type, because it ignores the RH argument's fields. The result "lacks
all" attributes of the RH arg.

I think "project away"s semantics can be defined without requiring
pattern-matching on the same label in two different records (and using `⌆`
to symbolise the operation -- closest I can get to pi-hat):

EmptyRec ⌆ rho2 = EmptyRec
(__x = x | rho1') ⌆ rho2 = (__x = x | (rho1' ⌆ rho2)), if rho2\__x
(__x = x | rho1') ⌆ rho2 =  (rho1' ⌆ rho2),  otherwise

Then other generic operations on whole-records can be defined in terms of
the above two, treating records as sets of label-field pairs, and applying
set operations:

rho1 `recIntersect` rho2 = (rho1 ⌆ (rho1 ⌆ rho2))     -- aka projection
rho1 `recUnion` rho2 = Just ((rho1 ⌆ rho2) | (rho1 ⌆ (rho1 ⌆ rho2)) | (rho2
⌆ rho1))
-- providing (rho1 ⌆ (rho1 ⌆ rho2)) == (rho2 ⌆ (rho2 ⌆ rho1))
-- otherwise Nothing

`recUnion` is more familiar as record-level relational Natural Join.
Returns a Maybe record, because the arguments might not be equal on the
field values of the labels in common.

So we have set difference, intersection, union. That must be expressively
complete.

AntC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/hugs-users/attachments/20181105/20ebb1a9/attachment.html>