[Hugs-users] generic operations on Trex records [was: [haskell-beginners] Lifting over record syntax]

Anthony Clayden anthony_clayden at clear.net.nz
Sun Nov 4 11:55:22 UTC 2018

On Thu, 1 Nov 2018 at 12:31 AM, Anthony Clayden  wrote:

> Label names in Trex are literals; there's no such thing as a label
> variable. (Which is why it's a tad annoying that they start lower case.)
> Furthermore the same label name must appear in both terms and types -- in
> fact labels occupy a namespace separate vs terms or types. So Trex is a
> long way from from generic record handling like:
> recAppend :: ( rho'\__x ) => Rec rho' -> Rec ( __x :: a) -> Rec ( __x :: a
> | rho')
> recAppend rho ( __x = x ) = ( __x = x | rho )
> in which I've used double-underscore prefix to signify a label variable.
> This is intended to extend a record `rho` with a singleton record. If we
> try appending a record with more than one field, beware that field order is
> arbitrary, so this
> recAppend rho ( __x = x, __y = y) = ...
> has no principal type (a familiar difficulty). The programmer doesn't care
> which way round labels `__x, __y` bind, providing they're distinct, but the
> typing does care.

I've just come across Gaster's 1997 paper 'Polymorphic Extensible Records
for Haskell', which gives a bit more detail on the internals of the
implementation. No surprise: records are stored in a continuous block of
memory, with fields left-to-right by alphabetic order of label. Then the
lacks constraint for each label name is merely an index into the vector,
and gets type-erased. So we could de-arbitrate field selection by taking
them left-to-right. (And introduce the ordering on labels into the formal

I see in the Gaster+Jones 1996 paper, there is a possible generic approach:
section 6.2 'First-class labels'. "This increases the expressiveness of our
system quite dramatically, ..." There's to be a distinct kind for labels,
and a type constructor Label :: label -> *. Presumably in 2018 we'd use
kind Symbol for labels. So label `name` is a type variable(?) set to `Label
"name"`. The paper doesn't seem to suggest any of the approach has been
developed, nor can I see evidence of it inside the Hugs code. Specifically
in the code, the parser puts all labels straight into a dedicated symbol
table, where they're treated as constants. (They can only appear in very
clearly identifiable syntactic contexts.) I'm not as sanguine as section
6.2: how to treat label variables as opposed to constants?

* How to treat pattern-matching on two distinct label variables? (example
* How to treat pattern-matching on one label variable appearing in distinct
patterns? Such as

recProject (__x = x | rho1) (__x = _ | rho2) = (__x = x | (recProject rho1
rho2) )

This is currently a valid term, BTW, but is looking for label constant
`__x` in both records.

This is trying to say: if the two records have some label in common (I
don't care which), then take that label and its field; recurse on the
leftovers of the two records. So this will project the LH record on labels
in common with RH record. It's a great design pattern. But worse than the
`recAppend` above, the compiler now has to pick arbitrarily a label that
appears in both arguments; not arbitrarily left-to-right alphabetic
ordering in just one record. (And if that's not possible, don't select that
instance; there's presumably a base case instance for no labels in common.)

It's all fine and dandy using label variables in the formal semantics. It's
quite different supporting them in surface syntax and typing.

> I'd like to write term `( rho1 | rho2 )` to concatenate two records.
> That's currently unrecognised syntax, so I think could be added as such.
> What would be its type/is it principal?
> ( rho1 | rho2 ) :: (rho1' \\ rho2') => Rec( rho1' | rho2' )    --
> inventing more syntax
> in which constraint `(rho1' \\ rho2')` requires the two rows' labels be
> mutually disjoint -- read "lacks all". Ur/web has something like this.

Also Harper&Pierce 1990, 1991 referenced in the G+J paper. We can define
"lacks all" in terms of lacks, by extending the G+J Figure 1 definition of
`\` thusly

* base case: two EmptyRecs lack all labels in common
* recursive case: given two recs that lack all plus some label that both
lack, adding that label at some type to one record, they still both lack


P ||- {| |} \\ {| |} ;

P ||- r1 \\ r2, r1\l, r2\l
P ||- {| l :: tau | r1 |} \\ r2

Note I'm not envisaging `|` as a genuine operator: this is still hard-wired
> syntax; pipe is a reserved symbol in H98 anyway.
Specifying its formal semantics does need distinguishing one of the
arguments being empty vs not (I'll give the term-level definition as if it
were possible, these need to be in distinct Instances):

(EmptyRec | rho2) = rho2
( (__x = x | rho1') | rho2) = (__x = x | (rho1' | rho2) )

Then to achieve full expressivity for a record system, I need just one more
operation: label subtraction or "project away" also known in relational
algebra as "remove" or "project ALL BUT", sometimes symbolised as pi-hat or
pi-overbar. Semantics: given two records with possibly some labels in
common, return the LH argument, with all in-common labels/fields removed.
Note this cunningly avoids requiring the in-common labels to have same
field type, because it ignores the RH argument's fields. The result "lacks
all" attributes of the RH arg.

I think "project away"s semantics can be defined without requiring
pattern-matching on the same label in two different records (and using `⌆`
to symbolise the operation -- closest I can get to pi-hat):

EmptyRec ⌆ rho2 = EmptyRec
(__x = x | rho1') ⌆ rho2 = (__x = x | (rho1' ⌆ rho2)), if rho2\__x
(__x = x | rho1') ⌆ rho2 =  (rho1' ⌆ rho2),  otherwise

Then other generic operations on whole-records can be defined in terms of
the above two, treating records as sets of label-field pairs, and applying
set operations:

rho1 `recIntersect` rho2 = (rho1 ⌆ (rho1 ⌆ rho2))     -- aka projection
rho1 `recUnion` rho2 = Just ((rho1 ⌆ rho2) | (rho1 ⌆ (rho1 ⌆ rho2)) | (rho2
⌆ rho1))
-- providing (rho1 ⌆ (rho1 ⌆ rho2)) == (rho2 ⌆ (rho2 ⌆ rho1))
-- otherwise Nothing

`recUnion` is more familiar as record-level relational Natural Join.
Returns a Maybe record, because the arguments might not be equal on the
field values of the labels in common.

So we have set difference, intersection, union. That must be expressively

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/hugs-users/attachments/20181105/20ebb1a9/attachment.html>

More information about the Hugs-Users mailing list