Why I Want Global Field Names (By Default)
Gershom Bazerman
gershomb at gmail.com
Mon Mar 5 06:03:52 CET 2012
The discussion on records has in some ways narrowed (which is good), but
within that narrowed scope of disagreement become very contentious on
global vs. local default scope for field names. Those in favor of
information-hiding as a key feature have been pretty vocal so far, and
while others have argued, I don't think the opposite approach has been
strongly motivated. I want to put forward at least one such strong
motivation for global field names (aka SORF-style) rather than locally
declared field names as a default.
Suppose we have some sort of persistence layer implemented with records
mapped to, e.g., rows within database tables. In the underlying data,
many columns in different tables could share the same name. Now we may
want to implement the generation of these record data declarations with
some sort of automatic code generation. Alternately, we may want to
write these declarations by hand, and automatically populate and migrate
the database. It doesn't really matter. For different columns with the
same name, we necessarily want to refer to these with the same selector
-- i.e. we're not interested in building a one-deep hierarchy where
certain "address" selectors refer to the "address" within some columns,
and other "address" selectors refer to others. Arguably, we might want
such a thing, but with sufficient tables with sufficient columns, the
complexity to the programmer in determining the right mappings could
simply be not worth it.
So, suppose we have a locally declared fields solution (such as DORF).
Now, where do these fields live? Arguably, we want a module per record.
Otherwise, we're forced to pull in the entire namespace of our database
at once, when we only want a fraction in any given client module.
Clearly, the field selectors shouldn't live in the module for each
individual record -- this means that they're dispersed all over the
place, and MyDb.Corporation must import MyDb.Person to get the "address"
label or vice-versa, which introduces strange and spurious dependencies.
The obvious solution is to create a MyDb.Labels module which holds all
the shared field declarations. But now, we want to introduce a new
table. This table uses some labels shared with other tables, but uses
some labels which are new. To add this table, we then have to update our
MyDb.Labels module. This then forces a recompilation of the Labels
module. This in turn forces a recompilation of every module which
depends on Labels -- at a minimum, every module representing any
database table. This in turn again forces a recompilation of any module
using any database table. For a project which actually uses the
persistence layer widely and freely, this effectively means a
recompilation of the entire project! For a sufficiently large project
(and I have worked with such) this can be extremely time consuming.
So, with any system limited to local field declarations, and with a
minimal and reasonable set of design choices to allow field sharing
where desired, we discover that it must necessarily have frequent full
rebuilds for operations which *should* require compiling a couple of
files at most.
The above argument holds under a number of modifications. For a
database/table layer, we can substitute a layer such as happstack uses.
Or, we can substitute a query API to some service which returns results
in JSON or XML (or even simply an XML parser layer).
How might we amend such a system to avoid this terribly pessimal, bad,
no-good behavior? Well, imagine we had a Labels module that declared
every possible field in advance. Now, no matter what we wanted, it would
already exist, the module would not need to be modified, and so a full
recompile would not be triggered. Such a Labels module is clearly
impossible to write in finite disk space, and compile in finite time.
However, we can provide a simulation of precisely this functionality in
the following way (using, e.g., hlists or the like) :
data Heof; data La; instance Letter La; data Lb; instance Letter
Lb; ...
instance Label Heof
instance (Letter a, Label b) => Label (a :* b)
class Label f => Has r f t where...
However, writing (undefined :: La :* Lb :* Lc :* Heof) for a label "abc"
is a bit of a pain. Sugar can eliminate this a bit. With a bit more
sugar, and new toys in GHC, we can simply write "abc" at the type level
rather than the value level! Clearly though, "abc" is a very different
sort of type than Bool or Int. It would be nice to statically state that
certain places which can now take any type, can only take types like
"abc". We want to restrict our types by their "type." This "type" given
to types is called a _Kind_. We can then say that things like "abc" at
the type level are of kind String. So, if we want to say that our Has
class can only take fields like "abc" we can say the following:
class Has (r :: *) (f :: String) (t :: *) where
get :: r -> t
Which is precisely the SORF proposal!
From our above description, we can see a number of straightforward ways
to recover optional representation hiding. The most obvious thing is
simply to wrap records in newtypes.
data PrivateRecord = PR {x :: Int}
newtype PRInterface= PRI {unPRI :: PR}
Now, if the module exports PRInterface and not PrivateRecord (and not
unPRI), users can pass around PR values (wrapped in a newtype), but
since those PR values don't have Has instances, then they can't get at
the x directly and certainly can't modify it. (Standalone + newtype
deriving lets you cheat here, but arguably you just want to make
breaking abstraction explicit and painful -- impossible is surprisingly
hard in Haskell anyway).
However, we can do one better, and recover DORF and SORF behavior at once!
class Label a
class Label f => Has (r :: *) (f :: *) (t :: *) where
get :: r -> t
instance Label (a :: String)
Now we have global field names "baked in" and by default (and indeed
proper global names, as I've argued *must be* baked in to be usable).
However, we also have the opportunity of declaring private field names:
data MyPrivateLabel
instance Label MyPrivateLabel
Now, if you don't export MyPrivateLabel (or any type which contains
MyPrivateLabel) then your abstraction is again safe!
So this is a slight generalization of SORF, that solves (I think) the
information hiding issue, giving easy use of either global or locally
scoped labels. Sugar can be implemented as in Ian's proposal, or
otherwise. If everyone finds this agreeable (and I can imagine no
reason they wouldn't!) then that leaves us with only a very few semantic
issues left. In fact, I think the *only* remaining issue is updates,
where I think it would be useful to redirect some collective thought.
Having solved all that, we can get back to what's really important --
debating syntax :-)
Hope this is clarifying,
Gershom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20120305/630e266c/attachment.htm>
More information about the Glasgow-haskell-users
mailing list