[Haskell-cafe] Advantages of using qualified names and a uniform naming convention

Sun Sep 3 22:31:55 EDT 2006

Hi -
There's lots of great Haskell libraries available, but little 
standardization regarding naming conventions or code organization. In this 
post I try to illustrate some dimensions of the question of how to form 
names for things and offer my opinion on specific examples knowing that this 
of course only represents my own personal view but may nevertheless be of 
objective interest as an *example* of a view and a possible starting point 
for discussion/ unification of naming conventions in Haskell code if such a 
thing is even desired/desireable at all.

It would be helpful to build up a more rigorous algorithm for determining 
names, but I think the first step is to try and isolate aspects of the 
problem thus the following is an initial attempt in this direction...

Firstly, I found the following advice by Henning Thielemann very useful in 
my own code [1]:

   In the style of Modula-3 I define one data type or one type
   class per module. The module is named after the implemented
   type or class. Then a type is named T, and a type class C.
   I use them qualified, e.g. Music.T or Collection.C.
   Similarly, if a type has only one constructor then I call it
   Cons and use it qualified MidiFile.Cons [I don't agree with
   this last suggestion].
   This style also answers the annoying question whether the
   module name should be in singular or plural form:
   Always choose singular form!

where the only thing I change is that the name of the value constructor for 
a type with only one value constructor should be the same as the name of the 
type constructor eg newtype T a = T (a->Int) which seems to be the normal 
convention anyway and seems better imho than introducing a different 
identifier for the value constructor when the namespaces for values and 
types/classes are already distinct.

The advantage of such a strict rule is that code becomes deliciously uniform 
and neat, even if you just follow the part of it that deals with types and 
modules and use some different convention for classes. It's also 
particularly nice if you're used to object oriented languages, since it 
preserves the intuition that you can deal with one thing at a time just as 
you would put each object into its own separate file or C++ unit.

Of course, not all modules can have just one type but I find that most only 
need to export one type and in any case the (main) exported type can be 
called "T". The exception is when the types are mutually recursive, so that 
different types which should really have their own module have to be put 
into the same module due to limitations of the compiler being used (eg GHC 
does not yet support mutually recursive modules involving types declared 
using newtype deriving and also requires hs-boot files which is imho a 
horrible mess as bad as the need to write separate header files in C and 
keep decls in sync therefore I avoid them at all costs).

Consider the following:

    import Data.IORef

    main = do
                    x <- newIORef (0::Int)
                    writeIORef x 3

compared to:

    import qualified Prime.Data.IORef as Ref

    main = do
                    x <- Ref.new (0::Int)
                    Ref.write x 3

To my mind the latter is infinitely cleaner looking, because the names of 
the functions just specify their purpose and nothing else, and it is 
immediately clear which module they have come from, and there is also the 
advantage that a shorter prefix could be used if required eg "R". In 
contrast, the first code example is polluted with mutiple repetitions of 
"IORef" - we already know we're in the IO monad so why keep stressing the 
point that we're using IORefs as opposed to STRefs etc? - and there is not 
even any certainty that the functions so named even come from that module 
(unless we're already familiar with the contents of the other modules of 
course).

Not only does the latter code look startlingly beautiful, but there is a 
significant advantage when using an editor which is sophisticated enough to 
make use of it: after typing the dot in "Ref.", it should be possible for an 
editor to then display a pop-up list of the visible contents of the module 
(various possible patent issues aside :-( ) ie the programmer just needs to 
know the module alias name rather than the names of each individual 
function/value/type/class when coding.

Furthermore (see there's almost no end to the advantages of this convention! 
;-)) if the implementation of IORef's was improved later, the whole module 
can be instantly ported to use the improved implementation just by changing 
the one import line rather than a painful search and replace of "newIORef" 
by "newBetterIORef" etc.

I think the current presence of names like "newIORef" in the base library is 
perhaps a result of historical development - qualified names or aliases may 
not yet have been invented so there was probably a need to follow a 
convention of appending the type to the purpose when creating a 
function/value name, but now that we have the ability to use qualified names 
it would seem to be a lot better if everything could be changed to use them 
ie Data.Set should export T not Set so you would use

    foo :: Set.T -> Int
    foo = Set.size

instead of import Data.Set as Set and trusting to luck that there is no 
other data type in scope with the name Set, or having to use Set.Set which 
looks wierd.

It probably goes without saying that with the use of qualified imports, 
symbols are absolutely gross and should be avoided at all costs. They're 
totally unreadable, not just because they're a squiggly unpronouncable mess, 
but because you need to simulate an operator parser in your head to discover 
what's being applied to what. Someone could even define <+> to bind tighter 
than <*>, so it's not even safe to rely on normal conventions, and different 
modules in a program could use the same symbol in totally different ways 
with different precedences, leading to a real headache and unnecessary bugs 
when jumping between code in an editor. For readable code, plenty of 
descriptive words and parentheses are surely preferable. The only exception 
I'd make is the use of >>=, >> (which is so fundamental the alternative do 
notation is built into the language), ($), ($!), (.), and common arithmetic 
ops.

Returning to Data.Set, let's now consider some of the names that are used:

    null :: Set a -> Bool

    empty :: Set a

(null) tests to see whether or not a set is empty, and (empty) is the empty 
set. The relationship between the words "null" and "empty" can only be found 
by looking in an English dictionary. I propose that linguistic relationships 
should never feature in names used in programming. Instead the relationships 
should always be expressed as far as possible in the forms of the 
identifiers as sequences of characters. Having a clear unified framework to 
create names would also help not only the library author but also library 
users. Thus I propose that (null) should actually have been called 
"isEmpty", so that the relationship with the use of (empty) to denote the 
empty set is immediately apparent, and the use of the word "is" would 
immediately tell you that the function is a predicate.

It might even be advantageous to reserve more characters for use in 
identifiers (since the infamous ASCII symbols are so abhorrent anyway ;-) ) 
so we could have a similar rule to Scheme, that predicates would end with a 
question mark thus relieving us of the need to decide between "is" and "has" 
(to try and eliminate as much of the messiness and indecision caused by 
natural language as possible), though of course this would be a more long 
term idea eg:

     -- So related things appear alphabetically together...
    empty? :: Set a -> Bool
    empty :: Set a

It could be argued that it would be more in keeping with left-to-right 
thinking to put the '?' first but then we'd lose the "related things should 
be together in any alphabetical list of functions/values", though such a 
compromise is already necessary when using "isEmpty" rather than "emptyIs" 
which would perhaps just sound too unnatural ;-)

Moving on to Data.List, we find a confusion of different spatial, temporal, 
and historical viewpoints jostling valiantly for supremacy in the 
programmer's mind:

    head tail        -- funny cartoon-like image of a list
    last init          -- temporal
    foldl foldr      -- spatial (left to right)

Despite the fact that (last, init) is somehow the dual of (head, tail), we 
have to switch from a temporal conception of a list to a historical 
"cartoon" conception to move between them. Then we come to foldl, foldr 
where we think in terms of left and right.

Also, with foldl, foldr, it is just lucky that we already know there is a 
word called "fold" so that we know the trailing "r" or "l" is intended to be 
a suffix - it's not nearly so clear in cases like "reducer" and "reducel" 
where "reducer" is also a single word in English.

Therefore I propose that for a list there should only be one underlying 
concept, that of a spatial sequence going from left to right, and that all 
functions should be named in accordance with this alone ie:

    atL atLs        -- similar to (x:xs)
    atR atRs
    foldL foldR

thus there would only be one concept (the spatial sequence) to grasp which 
the programmer could then map internally onto another concept such as time 
if required, without being forced by the names in Data.List to jump 
confusingly between them.

A discussion of naming in Data.List would certainly be incomplete without a 
mention of (nub). It must surely rank as one of the most peculiar 
identifiers in the whole history of programming. The documentation 
thankfully explains that it means "essence", but then goes on to say that it 
is in fact just a function for removing duplicates from a list. The mind can 
only convulse in the most tortuous configurations to try and reverse 
engineer this strange correspondence. Would the simple name 
(removeDuplicates) or even (asSet) not be much easier all round?

Related to lists, we of course have sequences and there, instead of cons and 
snoc, we could just have pushL and pushR which are nice logical names that 
don't conjure up images of candles and devils crawling out of mirrors etc 
like the use of reversed spelling in "snoc".

In case the above seems overly critical of other people's hard work and good 
intent, I'd like to confess that I myself am often tempted off the path when 
it comes to the creation of identifiers involving adjectives eg:

    type BlueCar = ...
    type RedCar = ...

The problem here is that these names, presumably both to do with "Car", are 
not going to appear next to each other in any alphabetical listing (if there 
are other names too), whereas:

    type CarBlue = ...
    type CarRed = ...

will. Thus the position of the adjective in natural language (in this case 
English) has to be ignored if you want a programming environment to display 
related things together.

Finally, as a piece de resistance :-), consider the following two 
definitions of (>>=) for the continuation monad:

    newtype Cont r a = Cont { runCont :: (a -> r) -> r }

    m >>= k  =
        Cont $ \c -> runCont m $ \a -> runCont (k a) c -- (1)

    Cont a_r_r >>= a_C_b_r_r =
         Cont $ \ b_r ->
              a_r_r (\a -> runCont (a_C_b_r_r   a)   b_r) -- (2)

In (1), we have to know what the associativity of ($) is (and this is a real 
fire bed of controversy!) and the variable names, apart from (a), don't give 
us the faintest clue what's going on, and in fact are doubly confusing 
because (m) is used to represent a value of type (m a) so there is a kind of 
kind error in the value name.

In contrast, (2) uses the simple convention that the type is reflected more 
or less directly in the name so that a_r_r means a value of type (a->r)->r 
(there is unfortunately no way to indicate bracketing in the name). Thus by 
reading the definition we can understand what's going on just by cancelling 
out components of the name thus: a_C_b_r_r a gives us C_b_r_r and runCont 
(C_b_r_r) gives us b_r_r and (b_r_r  b_r) gives us (r) so it's obvious that 
we're supplying an argument of type a->r to a_r_r as required.

A great thing about Haskell is that it's quite simple to start renaming 
things already without having to change existing modules. In my own code 
I've simply just created new modules for IORef and Unique that re-export the 
original module contents but with the unsuffixed names, then I import from a 
hierarchy optimistically called "Prime" instead of the normal hierarchy :-)

Having said all this, it's nevertheless an open question to me, regarding 
the meeting between human psychology and the desire for consistent logical 
naming conventions, if perhaps the rapid jumping between different 
viewpoints encountered in the functions in Data.List etc has a stimulating 
effect on the imagination in terms of encouraging mobility of thought and a 
danger with fully rationalised names could be that they may make one dry and 
brittle - though at least there would be no danger of a headless fire 
breathing dragon emerging from a mirror because you've just typed "cons" 
backwards.... :-)

Regards, Brian.

[1] http://haskell.org/hawiki/UsingQualifiedNames (bottom of the page)
-- 
Logic empowers us and Love gives us purpose.
Yet still phantoms restless for eras long past,
congealed in the present in unthought forms,
strive mightily unseen to destroy us.

http://www.metamilk.com