Records in Haskell - namespacing for fields

Tue Jan 31 14:56:26 CET 2012

Simon Peyton-Jones <simonpj <at> microsoft.com> writes:

> 
> Beyond that, it would really help namespacing in general to appropriately 
extend the module system to allow multiple modules to be declared within a 
single file -- or, better yet, "submodules". [snip] the added expressive power 
should make namespacing issues
>  much more tractable. [snip] this isn't about implementing records as such --
 rather, it's about generally extending the expressive power of the language 
so that record systems--among other things--are easier to write.
>  
> I’m agnostic about nested modules.  In principle they would be a good 
thing.  However, for the application to records in particular, ...

No! no! no! For records _don't_ put records in nested/sub-modules, and _don't_ 
require them in separate modules (as currently). Here's how ...

[There may be other reasons for nested/sub-modules, but records ain't it.]

The reason was hinted at way back in Chris Done's attachment to the original 
Records wiki http://hackage.haskell.org/trac/ghc/wiki/Records "types in a non-
trivial project".

Let's say I have a database application with a field (meaning type) 
customer_id. Then it appears in records for name and address, pricing, order 
entry, etc. This is not a name 'clash', it's 'intended sharing'. (It really 
galls me to even put it that way for explanatory purposes. Really it's the 
**same** customer_id.)

In data model design you'd typically go about identifying all the fields 
(types aka attributes) and putting them in a data dictionary. Then you'd 
construct your records from them.

You might (possibly) put the data dictionary in a distinct module, for easy 
maintenance. But you'd certainly want all the customer-related records in the 
same module.

So a data decl:
     data Customer_NameAddress = Cust_NA { customer_id :: Int, ... }

is _not_ declaring customer_id, it's _using_ an already-declared field.
(Actually, if you've got any sense, you'll declare:
     newtype Customer_id = Customer_id Int
     data ... = { customer_id :: Customer_id, ... }
and that takes us to Type-indexed records and then a short hop to anonymous 
tuples and polymorphic records and tuple extension/concat/merge and ... one 
day!)

The other purpose of the data dictionary is to declare what DBMS's call 
the 'domain' of the field (Int in the case of customer_id). The terminology's 
going to get a bit confusing here: Haskell's field name (selector functions) 
apply to the record as the function's domain, and Int as the result (range).

For Haskell's field selectors we might also want to constrain the records they 
can be used in. (For example they must be 'Persist'able so that we can write 
them to an external database.)

So, to the proposal (I'm assuming http://www.haskell.org/pipermail/glasgow-
haskell-users/2011-December/021298.html can be made workable. SPJ has been 
kind enough to give it a once-over http://www.haskell.org/pipermail/glasgow-
haskell-users/2012-January/021744.html, but of course all faults are the 
author's alone.) Specifically, there's to be a class Has with methods get and 
set. This is grossly simplified, see the posts):

0.        class Has r fld t  where        -- record r has field fld at type t
                get :: r -> fld -> t
                set :: fld -> t -> r -> r'  -- update t into r
                      -- the r' is to cater for type-changing updates
                      -- it's actually a type function over r fld t

And then:
1. We need -XDisambiguateRecordFields,
   so that we can talk about specific record types unambiguously(!)
2. We must avoid auto-generating the field selector function from data decls.
   So that we can free up the namespace and share the field name with other 
records.
   (IMHO this should always have been part of -XDisambiguate...
    I'm not the first to say that.)
3. We need a 'peg' at the type/kind level to drive instance resolution.
   (This is the `fld' of the Has instance.
    SPJ's SORF uses a String Kind. I'll use phantom type Proxy_field.)
4. There's to be a new form of declaration:
       field customer_id                         -- think data dictionary
   This is syntactic sugar for:
       type Proxy_customer_id                    -- phantom 'peg'
       customer_id :: (Has r Proxy_customer_id t) => r -> t
       customer_id r = get r (undefined :: Proxy_customer_id)
   So we now have a field selector function, similar to H98's but polymorphic.

5. Here's a data decl and its generated Has instance:
       data Customer_NameAddress = Cust_NA { customer_id :: Int, ... }
       instance (t ~ Int) => Has Customer_NameAddress Proxy_customer_id t ...
   in which get/set are defined using the data constructor unambiguously.
   The (t ~ ...) constraint is to force instance match, followed by type)
   refinement as a "functional-dependency-like mechanism" [SPJ].
6. Another data decl using field customer_id generates another Has instance.
   So customer_id is polymorphic and 'just a function'.
   So at use sites, we use regular instance resolution, no dodgy semantics.

7. In other modules/namespaces, we can hide the existence and/or 
   representation of fields, using standard export/import controls.
   We don't have to try to hide the instances (which Haskell don't do). 

4b. At step 4. I wasn't telling the whole story. For this application,
    we want customer_id to be always an Int.
    We also (let's say) want the record to be Persistable.
    So the full syntax for the new declaration is:
       field customer_id :: (Persist r) => r -> Int
    (The :: type spec is optional.) The decl generates
       customer_id :: (Persist r, Has r Proxy_customer_id t, t ~ Int )
                      => r -> t
    [I'm not sure on that r -> t,
     perhaps r -> Int is better to help type inference??]

Back to modules and namespaces ... Some other module on a distant planet 
declares a field customer_id. It's using String as the domain (type). (Or 
perhaps it's using H98-style field naming.) Someone in the asteroid belt 
builds an application which imports both modules.

**Now** we have a name clash: these are different customer_id's. But this is 
just regular, familiar name clash. We're sweet: the field selector is 'just a 
function' we can use it qualified. (Also the phantom Proxy_customer_id must be 
qualified -- not that the end user/programmer needs to know that, not that it 
matters because it's only a phantom.)

Leftovers/problems for the implementer (thanks, Simons ;-):

Keyword `field' is probably used all over the place in extant code. (Also 
`label', and anything else we'd probably like to use.) Somebody please choose 
another keyword. Let's not put disproportionate effort into lexical syntax.

Somewhat off-topic for namespaces, an allowable (optional) syntax for field 
access is to be dot notation:
       show r.customer_id   -- desugars to: show (customer_id r)
How do we disambiguate this when there's differing fields in scope? -- easy:
       r.My.customer_id          -- vs. r.Their.customer_id
The syntax rule is that an upper-case to the left of the dot means qualified 
name, and binds rightwards more tightly. You can use parentheses to override 
binding.

Perhaps the asteroid-dweller wants to create their own record including a 
customer_id:
      data Customer_Planet = Cust_Planet { customer_id :: Int, ... }
Now we're in trouble: the current syntax for data decl's doesn't contemplate 
qualified field names. Help!! Perhaps we only allow record decls using field 
names imported unqualified?

A possible refinement: it's daft in the data decl having to spec :: Int all 
the time, when the field declaration has already said that. Can we default it? 
Tricky: { customer_id, name, address :: String } looks like it's declaring 
three fields type String.

Syntax for field update:
      f1 Cust_NA{ customer_id, .. } = Cust_NA{ customer_id = 27, ..}
(that is, with explicit data constructors) is regular DisambiguateRecordFields 
with wildcards and punning. Works as currently (GHC 7.2.). (And in fact the 
instance definition for set uses this syntax.)

Contrast
      f2 r = r{ customer_id = 27 }
(that is, no data constructor prefixing the record update, `r' could be an 
expression)
desugars to a polymorphic call to `set', with normal type inference and 
instance resolution:
      f2 r = set (undefined :: Proxy_customer_id) 27 r
infer f2 :: r { customer_id :: Num a => a } => r -> r'
(The Num a should get refined to Int.)

Should get/set have a proxy argument?
Like SPJ, I've no strong preference: whatever works.
My definition uses proxies (phantoms), because I was working in GHC 7.2.1 
without 7.4.1's new PolyKinds. I quickly experimented with avoiding a proxy, 
but type inference wouldn't play. The end-programmer won't be defining 
instances, so this could be 'implementation dependent'.

Syntactic sugar for `Has', and eliding multiple constraints for the same 
record.
SPJ is absolutely spot-on. Looks very intuitive. I slipped in an example just 
above.