From anthony_clayden at clear.net.nz Sun Nov 4 11:55:22 2018 From: anthony_clayden at clear.net.nz (Anthony Clayden) Date: Mon, 5 Nov 2018 00:55:22 +1300 Subject: [Hugs-users] generic operations on Trex records [was: [haskell-beginners] Lifting over record syntax] In-Reply-To: References: Message-ID: On Thu, 1 Nov 2018 at 12:31 AM, Anthony Clayden wrote: > Label names in Trex are literals; there's no such thing as a label > variable. (Which is why it's a tad annoying that they start lower case.) > Furthermore the same label name must appear in both terms and types -- in > fact labels occupy a namespace separate vs terms or types. So Trex is a > long way from from generic record handling like: > > recAppend :: ( rho'\__x ) => Rec rho' -> Rec ( __x :: a) -> Rec ( __x :: a > | rho') > recAppend rho ( __x = x ) = ( __x = x | rho ) > > in which I've used double-underscore prefix to signify a label variable. > This is intended to extend a record `rho` with a singleton record. If we > try appending a record with more than one field, beware that field order is > arbitrary, so this > > recAppend rho ( __x = x, __y = y) = ... > > has no principal type (a familiar difficulty). The programmer doesn't care > which way round labels `__x, __y` bind, providing they're distinct, but the > typing does care. > I've just come across Gaster's 1997 paper 'Polymorphic Extensible Records for Haskell', which gives a bit more detail on the internals of the implementation. No surprise: records are stored in a continuous block of memory, with fields left-to-right by alphabetic order of label. Then the lacks constraint for each label name is merely an index into the vector, and gets type-erased. So we could de-arbitrate field selection by taking them left-to-right. (And introduce the ordering on labels into the formal typing.) I see in the Gaster+Jones 1996 paper, there is a possible generic approach: section 6.2 'First-class labels'. "This increases the expressiveness of our system quite dramatically, ..." There's to be a distinct kind for labels, and a type constructor Label :: label -> *. Presumably in 2018 we'd use kind Symbol for labels. So label `name` is a type variable(?) set to `Label "name"`. The paper doesn't seem to suggest any of the approach has been developed, nor can I see evidence of it inside the Hugs code. Specifically in the code, the parser puts all labels straight into a dedicated symbol table, where they're treated as constants. (They can only appear in very clearly identifiable syntactic contexts.) I'm not as sanguine as section 6.2: how to treat label variables as opposed to constants? * How to treat pattern-matching on two distinct label variables? (example above) * How to treat pattern-matching on one label variable appearing in distinct patterns? Such as recProject (__x = x | rho1) (__x = _ | rho2) = (__x = x | (recProject rho1 rho2) ) This is currently a valid term, BTW, but is looking for label constant `__x` in both records. This is trying to say: if the two records have some label in common (I don't care which), then take that label and its field; recurse on the leftovers of the two records. So this will project the LH record on labels in common with RH record. It's a great design pattern. But worse than the `recAppend` above, the compiler now has to pick arbitrarily a label that appears in both arguments; not arbitrarily left-to-right alphabetic ordering in just one record. (And if that's not possible, don't select that instance; there's presumably a base case instance for no labels in common.) It's all fine and dandy using label variables in the formal semantics. It's quite different supporting them in surface syntax and typing. > I'd like to write term `( rho1 | rho2 )` to concatenate two records. > That's currently unrecognised syntax, so I think could be added as such. > What would be its type/is it principal? > > ( rho1 | rho2 ) :: (rho1' \\ rho2') => Rec( rho1' | rho2' ) -- > inventing more syntax > > in which constraint `(rho1' \\ rho2')` requires the two rows' labels be > mutually disjoint -- read "lacks all". Ur/web has something like this. > Also Harper&Pierce 1990, 1991 referenced in the G+J paper. We can define "lacks all" in terms of lacks, by extending the G+J Figure 1 definition of `\` thusly * base case: two EmptyRecs lack all labels in common * recursive case: given two recs that lack all plus some label that both lack, adding that label at some type to one record, they still both lack all. Symbolically: P ||- {| |} \\ {| |} ; P ||- r1 \\ r2, r1\l, r2\l --------------------- P ||- {| l :: tau | r1 |} \\ r2 Note I'm not envisaging `|` as a genuine operator: this is still hard-wired > syntax; pipe is a reserved symbol in H98 anyway. > > Specifying its formal semantics does need distinguishing one of the arguments being empty vs not (I'll give the term-level definition as if it were possible, these need to be in distinct Instances): (EmptyRec | rho2) = rho2 ( (__x = x | rho1') | rho2) = (__x = x | (rho1' | rho2) ) Then to achieve full expressivity for a record system, I need just one more operation: label subtraction or "project away" also known in relational algebra as "remove" or "project ALL BUT", sometimes symbolised as pi-hat or pi-overbar. Semantics: given two records with possibly some labels in common, return the LH argument, with all in-common labels/fields removed. Note this cunningly avoids requiring the in-common labels to have same field type, because it ignores the RH argument's fields. The result "lacks all" attributes of the RH arg. I think "project away"s semantics can be defined without requiring pattern-matching on the same label in two different records (and using `⌆` to symbolise the operation -- closest I can get to pi-hat): EmptyRec ⌆ rho2 = EmptyRec (__x = x | rho1') ⌆ rho2 = (__x = x | (rho1' ⌆ rho2)), if rho2\__x (__x = x | rho1') ⌆ rho2 = (rho1' ⌆ rho2), otherwise Then other generic operations on whole-records can be defined in terms of the above two, treating records as sets of label-field pairs, and applying set operations: rho1 `recIntersect` rho2 = (rho1 ⌆ (rho1 ⌆ rho2)) -- aka projection rho1 `recUnion` rho2 = Just ((rho1 ⌆ rho2) | (rho1 ⌆ (rho1 ⌆ rho2)) | (rho2 ⌆ rho1)) -- providing (rho1 ⌆ (rho1 ⌆ rho2)) == (rho2 ⌆ (rho2 ⌆ rho1)) -- otherwise Nothing `recUnion` is more familiar as record-level relational Natural Join. Returns a Maybe record, because the arguments might not be equal on the field values of the labels in common. So we have set difference, intersection, union. That must be expressively complete. AntC -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony_clayden at clear.net.nz Fri Nov 9 02:42:40 2018 From: anthony_clayden at clear.net.nz (Anthony Clayden) Date: Fri, 9 Nov 2018 15:42:40 +1300 Subject: [Hugs-users] Trex label names: leaky abstraction Message-ID: Gaster & Jones 1996 says "We consider two rows to be equivalent if they include the same fields regardless of the order in which they are listed." [section 3.2] But this is tricky: Figure 6 shows there is an ordering on labels, so that in effect records are compiled to tuples with left-to-right ordering of label names alphabetically. The User Manual says "The order of fields in a record pattern *is* significant because it determines the order---from left to right---in which they are matched." In particular, a pattern match on a record with a field value `undefined` might succeed if some other field label is mentioned first in the pattern; but throw an error if it's the undefined field's label mentioned first. It's not difficult to expose this behaviour. Consider > import Hugs.Trex > > x5y = (x = 5, y = 'y'); x7y = (x = 7, y = undefined) > z5y = (z = 5, y = 'y'); z7y = (z = 7, y = undefined) > > x5y == x7y -- returns False > z5y == z7y -- throws error Prelude.undefined The tuples with label `z` are compiled to put that field second, regardless of the order of appearing in the expression. Then (==) applies by comparing field values in left-to-right alphabetic ordering. Field `x` is alphabetically before `y`; and the `x` values differ; so the comparison returns False without comparing the `y` values. Then in comparing the field `z` records, the `y` fields are compared first (ignoring that the `z` values differ), exposing that one `y` field is `undefined`. This is similar to comparing tuples > (5, 'y') == (7, undefined) -- returns False > ('y', 5) == (undefined, 7) -- throws error There seems to be some infelicity when building records: > f ... rho ... = (y = 'y' | rho) This throws strange errors depending whether `rho` includes labels that are alphabetically before `y`. Bug report incoming ... AntC -------------- next part -------------- An HTML attachment was scrubbed... URL: