Records in Haskell - namespacing for fields

John Lask jvlask at hotmail.com
Wed Feb 1 00:24:54 CET 2012


On 1/02/2012 12:26 AM, AntC wrote:
> Simon Peyton-Jones<simonpj<at>  microsoft.com>  writes:
>
>>
>> Beyond that, it would really help namespacing in general to appropriately
> extend the module system to allow multiple modules to be declared within a
> single file -- or, better yet, "submodules". [snip] the added expressive power
> should make namespacing issues
>>   much more tractable. [snip] this isn't about implementing records as such --
>   rather, it's about generally extending the expressive power of the language
> so that record systems--among other things--are easier to write.
>>
>> I’m agnostic about nested modules.  In principle they would be a good
> thing.  However, for the application to records in particular, ...
>
> No! no! no! For records _don't_ put records in nested/sub-modules, and _don't_
> require them in separate modules (as currently). Here's how ...
>


namespace management and record systems are intimately related, but as 
you assert distinct issues.

taking your example of Customer_id, I expressly want to be able to 
define in the same syntactic module (file). Two records with a field 
with exactly the same name at two different types, say Customer_id :: 
Int and Customer_id :: String. As I understand it, your proposal
would not enable this.

In reality these two different uses of the name Customer_id are 
unrelated and distinct. Some would argue that therefore they should 
rightly have distinct names, however that is moot, for the purpose of 
this discussion lets take this as the objective.

There are two roads to travel: Customer_id is one semantic entity (in 
the sense of an overloaded field, disambiguated by the type system) or 
Customer_id represents two distinct semantic entities disambiguated 
syntactically. I for one favor the second approach as it matches my 
intent, can I say, more faithfully.

> [There may be other reasons for nested/sub-modules, but records ain't it.]

as above, however, I believe the best approach is to explore both paths 
and perhaps extend Haskell in both directions.

As in your example the particular construction of which suits, the 
approach offered in your email better because it matches your intent 
"more faithfully". It is this ability to match construction with intent 
that is critical, which alludes to the notion of "the expressivity" of a 
language.

> The reason was hinted at way back in Chris Done's attachment to the original
> Records wiki http://hackage.haskell.org/trac/ghc/wiki/Records "types in a non-
> trivial project".
>
> Let's say I have a database application with a field (meaning type)
> customer_id. Then it appears in records for name and address, pricing, order
> entry, etc. This is not a name 'clash', it's 'intended sharing'. (It really
> galls me to even put it that way for explanatory purposes. Really it's the
> **same** customer_id.)
>
> In data model design you'd typically go about identifying all the fields
> (types aka attributes) and putting them in a data dictionary. Then you'd
> construct your records from them.
>
> You might (possibly) put the data dictionary in a distinct module, for easy
> maintenance. But you'd certainly want all the customer-related records in the
> same module.
>
> So a data decl:
>       data Customer_NameAddress = Cust_NA { customer_id :: Int, ... }
>
> is _not_ declaring customer_id, it's _using_ an already-declared field.
> (Actually, if you've got any sense, you'll declare:
>       newtype Customer_id = Customer_id Int
>       data ... = { customer_id :: Customer_id, ... }
> and that takes us to Type-indexed records and then a short hop to anonymous
> tuples and polymorphic records and tuple extension/concat/merge and ... one
> day!)
>
> The other purpose of the data dictionary is to declare what DBMS's call
> the 'domain' of the field (Int in the case of customer_id). The terminology's
> going to get a bit confusing here: Haskell's field name (selector functions)
> apply to the record as the function's domain, and Int as the result (range).
>
> For Haskell's field selectors we might also want to constrain the records they
> can be used in. (For example they must be 'Persist'able so that we can write
> them to an external database.)
>
> So, to the proposal (I'm assuming http://www.haskell.org/pipermail/glasgow-
> haskell-users/2011-December/021298.html can be made workable. SPJ has been
> kind enough to give it a once-over http://www.haskell.org/pipermail/glasgow-
> haskell-users/2012-January/021744.html, but of course all faults are the
> author's alone.) Specifically, there's to be a class Has with methods get and
> set. This is grossly simplified, see the posts):
>
> 0.        class Has r fld t  where        -- record r has field fld at type t
>                  get :: r ->  fld ->  t
>                  set :: fld ->  t ->  r ->  r'  -- update t into r
>                        -- the r' is to cater for type-changing updates
>                        -- it's actually a type function over r fld t
>
> And then:
> 1. We need -XDisambiguateRecordFields,
>     so that we can talk about specific record types unambiguously(!)
> 2. We must avoid auto-generating the field selector function from data decls.
>     So that we can free up the namespace and share the field name with other
> records.
>     (IMHO this should always have been part of -XDisambiguate...
>      I'm not the first to say that.)
> 3. We need a 'peg' at the type/kind level to drive instance resolution.
>     (This is the `fld' of the Has instance.
>      SPJ's SORF uses a String Kind. I'll use phantom type Proxy_field.)
> 4. There's to be a new form of declaration:
>         field customer_id                         -- think data dictionary
>     This is syntactic sugar for:
>         type Proxy_customer_id                    -- phantom 'peg'
>         customer_id :: (Has r Proxy_customer_id t) =>  r ->  t
>         customer_id r = get r (undefined :: Proxy_customer_id)
>     So we now have a field selector function, similar to H98's but polymorphic.
>
> 5. Here's a data decl and its generated Has instance:
>         data Customer_NameAddress = Cust_NA { customer_id :: Int, ... }
>         instance (t ~ Int) =>  Has Customer_NameAddress Proxy_customer_id t ...
>     in which get/set are defined using the data constructor unambiguously.
>     The (t ~ ...) constraint is to force instance match, followed by type)
>     refinement as a "functional-dependency-like mechanism" [SPJ].
> 6. Another data decl using field customer_id generates another Has instance.
>     So customer_id is polymorphic and 'just a function'.
>     So at use sites, we use regular instance resolution, no dodgy semantics.
>
> 7. In other modules/namespaces, we can hide the existence and/or
>     representation of fields, using standard export/import controls.
>     We don't have to try to hide the instances (which Haskell don't do).
>
>
> 4b. At step 4. I wasn't telling the whole story. For this application,
>      we want customer_id to be always an Int.
>      We also (let's say) want the record to be Persistable.
>      So the full syntax for the new declaration is:
>         field customer_id :: (Persist r) =>  r ->  Int
>      (The :: type spec is optional.) The decl generates
>         customer_id :: (Persist r, Has r Proxy_customer_id t, t ~ Int )
>                        =>  r ->  t
>      [I'm not sure on that r ->  t,
>       perhaps r ->  Int is better to help type inference??]
>
> Back to modules and namespaces ... Some other module on a distant planet
> declares a field customer_id. It's using String as the domain (type). (Or
> perhaps it's using H98-style field naming.) Someone in the asteroid belt
> builds an application which imports both modules.
>
> **Now** we have a name clash: these are different customer_id's. But this is
> just regular, familiar name clash. We're sweet: the field selector is 'just a
> function' we can use it qualified. (Also the phantom Proxy_customer_id must be
> qualified -- not that the end user/programmer needs to know that, not that it
> matters because it's only a phantom.)
>
>
> Leftovers/problems for the implementer (thanks, Simons ;-):
>
> Keyword `field' is probably used all over the place in extant code. (Also
> `label', and anything else we'd probably like to use.) Somebody please choose
> another keyword. Let's not put disproportionate effort into lexical syntax.
>
>
> Somewhat off-topic for namespaces, an allowable (optional) syntax for field
> access is to be dot notation:
>         show r.customer_id   -- desugars to: show (customer_id r)
> How do we disambiguate this when there's differing fields in scope? -- easy:
>         r.My.customer_id          -- vs. r.Their.customer_id
> The syntax rule is that an upper-case to the left of the dot means qualified
> name, and binds rightwards more tightly. You can use parentheses to override
> binding.
>
>
> Perhaps the asteroid-dweller wants to create their own record including a
> customer_id:
>        data Customer_Planet = Cust_Planet { customer_id :: Int, ... }
> Now we're in trouble: the current syntax for data decl's doesn't contemplate
> qualified field names. Help!! Perhaps we only allow record decls using field
> names imported unqualified?
>
>
> A possible refinement: it's daft in the data decl having to spec :: Int all
> the time, when the field declaration has already said that. Can we default it?
> Tricky: { customer_id, name, address :: String } looks like it's declaring
> three fields type String.
>
>
> Syntax for field update:
>        f1 Cust_NA{ customer_id, .. } = Cust_NA{ customer_id = 27, ..}
> (that is, with explicit data constructors) is regular DisambiguateRecordFields
> with wildcards and punning. Works as currently (GHC 7.2.). (And in fact the
> instance definition for set uses this syntax.)
>
> Contrast
>        f2 r = r{ customer_id = 27 }
> (that is, no data constructor prefixing the record update, `r' could be an
> expression)
> desugars to a polymorphic call to `set', with normal type inference and
> instance resolution:
>        f2 r = set (undefined :: Proxy_customer_id) 27 r
> infer f2 :: r { customer_id :: Num a =>  a } =>  r ->  r'
> (The Num a should get refined to Int.)
>
>
> Should get/set have a proxy argument?
> Like SPJ, I've no strong preference: whatever works.
> My definition uses proxies (phantoms), because I was working in GHC 7.2.1
> without 7.4.1's new PolyKinds. I quickly experimented with avoiding a proxy,
> but type inference wouldn't play. The end-programmer won't be defining
> instances, so this could be 'implementation dependent'.
>
>
> Syntactic sugar for `Has', and eliding multiple constraints for the same
> record.
> SPJ is absolutely spot-on. Looks very intuitive. I slipped in an example just
> above.
>
>
>
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users




More information about the Glasgow-haskell-users mailing list