[Haskell-cafe] Diving into the records swamp (possible GSoC project)

Sat Apr 27 12:23:46 CEST 2013

> Johan Tibell <johan.tibell <at> gmail.com> writes:
> 
> Instead of endorsing one of the listed proposals directly, I will 
emphasize the problem, so we don't lose sight of it. The problem people 
run into *in practice* and complain about in blog posts, on Google+, or 
privately when we chat about Haskell over beer, is that they would like to 
write a record definition like this one:
> 
>     data Employee = Employee { id :: Int, name :: String }
> 
>     printId :: Employee -> IO ()
>     printId emp = print $ id emp
> 
> but since that doesn't work well in Haskell today due to name
> collisions, ...

[I've a bit more to say on that record definition below.]

Thank you Johan, I agree we should keep clear sight of the problem. So 
let's be a bit more precise: it's not exactly the record declaration that 
causes the name collisions, it's the field selector function that gets 
created automatically. (Note that we can use xDisambiguateRecordFields to 
access fields to, errm, disambiguate.)

So I did put in a separate proposal [3] (and ticket) on that very narrow 
issue. (Simon M pointed out that I probably didn't name it very well!)

Even if we do nothing to advance the "records swamp", PLEASE can we 
provide a compiler option to suppress that function.

I envisage it might facilitate a 'cottage industry' of Template Haskell 
solutions (generating Has instances), which would be a cheap and cheerful 
way to experiment in the design space.

[3] 
http://hackage.haskell.org/trac/ghc/wiki/Records/DeclaredOverloadedRecordFi
elds/NoMonoRecordFields
(There are bound to be some fishhooks, especially around export/import of 
names from a module with no selector functions to one that's expecting 
them.)

[cont from above]
> ... the best practice today is to instead write something like:
> 
>     data Employee = Employee { employeeId :: Int, employeeName :: 
String }
> 
>     printId :: Employee -> IO ()
>     printId emp = print $ employeeId emp
> 
> The downsides of the latter have been discussed elsewhere, but briefly 
they are:
> 
>  * Overly verbose when there's no ambiguity.
>  * Ad-hoc prefix is hard to predict (i.e. sometimes abbreviations of the 
data type name are used).

I don't entirely agree with your analysis.
 * fields named `id' or `name' are very likely to clash,
   so that's a bad design (_too_ generic).
 * If you've normalised your data model [**],
   you are very likely to want exactly the same field
   in several records
   (for example employeeId in EmployeeNameAddress,
    and in EmployeePay and in EmployeeTimeSheet.)

[And this use case is what TP/DORF is primarily aimed at.]

[**] Do I need to explain what data model normalisation is? I fear that so-
called XML 'databases' mean academics don't get taught normalisation any 
more(?)

AntC