[Haskell-cafe] Advantages of using qualified names and a uniform
naming convention
Brian Hulley
brianh at metamilk.com
Sun Sep 3 22:31:55 EDT 2006
Hi -
There's lots of great Haskell libraries available, but little
standardization regarding naming conventions or code organization. In this
post I try to illustrate some dimensions of the question of how to form
names for things and offer my opinion on specific examples knowing that this
of course only represents my own personal view but may nevertheless be of
objective interest as an *example* of a view and a possible starting point
for discussion/ unification of naming conventions in Haskell code if such a
thing is even desired/desireable at all.
It would be helpful to build up a more rigorous algorithm for determining
names, but I think the first step is to try and isolate aspects of the
problem thus the following is an initial attempt in this direction...
Firstly, I found the following advice by Henning Thielemann very useful in
my own code [1]:
In the style of Modula-3 I define one data type or one type
class per module. The module is named after the implemented
type or class. Then a type is named T, and a type class C.
I use them qualified, e.g. Music.T or Collection.C.
Similarly, if a type has only one constructor then I call it
Cons and use it qualified MidiFile.Cons [I don't agree with
this last suggestion].
This style also answers the annoying question whether the
module name should be in singular or plural form:
Always choose singular form!
where the only thing I change is that the name of the value constructor for
a type with only one value constructor should be the same as the name of the
type constructor eg newtype T a = T (a->Int) which seems to be the normal
convention anyway and seems better imho than introducing a different
identifier for the value constructor when the namespaces for values and
types/classes are already distinct.
The advantage of such a strict rule is that code becomes deliciously uniform
and neat, even if you just follow the part of it that deals with types and
modules and use some different convention for classes. It's also
particularly nice if you're used to object oriented languages, since it
preserves the intuition that you can deal with one thing at a time just as
you would put each object into its own separate file or C++ unit.
Of course, not all modules can have just one type but I find that most only
need to export one type and in any case the (main) exported type can be
called "T". The exception is when the types are mutually recursive, so that
different types which should really have their own module have to be put
into the same module due to limitations of the compiler being used (eg GHC
does not yet support mutually recursive modules involving types declared
using newtype deriving and also requires hs-boot files which is imho a
horrible mess as bad as the need to write separate header files in C and
keep decls in sync therefore I avoid them at all costs).
Consider the following:
import Data.IORef
main = do
x <- newIORef (0::Int)
writeIORef x 3
compared to:
import qualified Prime.Data.IORef as Ref
main = do
x <- Ref.new (0::Int)
Ref.write x 3
To my mind the latter is infinitely cleaner looking, because the names of
the functions just specify their purpose and nothing else, and it is
immediately clear which module they have come from, and there is also the
advantage that a shorter prefix could be used if required eg "R". In
contrast, the first code example is polluted with mutiple repetitions of
"IORef" - we already know we're in the IO monad so why keep stressing the
point that we're using IORefs as opposed to STRefs etc? - and there is not
even any certainty that the functions so named even come from that module
(unless we're already familiar with the contents of the other modules of
course).
Not only does the latter code look startlingly beautiful, but there is a
significant advantage when using an editor which is sophisticated enough to
make use of it: after typing the dot in "Ref.", it should be possible for an
editor to then display a pop-up list of the visible contents of the module
(various possible patent issues aside :-( ) ie the programmer just needs to
know the module alias name rather than the names of each individual
function/value/type/class when coding.
Furthermore (see there's almost no end to the advantages of this convention!
;-)) if the implementation of IORef's was improved later, the whole module
can be instantly ported to use the improved implementation just by changing
the one import line rather than a painful search and replace of "newIORef"
by "newBetterIORef" etc.
I think the current presence of names like "newIORef" in the base library is
perhaps a result of historical development - qualified names or aliases may
not yet have been invented so there was probably a need to follow a
convention of appending the type to the purpose when creating a
function/value name, but now that we have the ability to use qualified names
it would seem to be a lot better if everything could be changed to use them
ie Data.Set should export T not Set so you would use
foo :: Set.T -> Int
foo = Set.size
instead of import Data.Set as Set and trusting to luck that there is no
other data type in scope with the name Set, or having to use Set.Set which
looks wierd.
It probably goes without saying that with the use of qualified imports,
symbols are absolutely gross and should be avoided at all costs. They're
totally unreadable, not just because they're a squiggly unpronouncable mess,
but because you need to simulate an operator parser in your head to discover
what's being applied to what. Someone could even define <+> to bind tighter
than <*>, so it's not even safe to rely on normal conventions, and different
modules in a program could use the same symbol in totally different ways
with different precedences, leading to a real headache and unnecessary bugs
when jumping between code in an editor. For readable code, plenty of
descriptive words and parentheses are surely preferable. The only exception
I'd make is the use of >>=, >> (which is so fundamental the alternative do
notation is built into the language), ($), ($!), (.), and common arithmetic
ops.
Returning to Data.Set, let's now consider some of the names that are used:
null :: Set a -> Bool
empty :: Set a
(null) tests to see whether or not a set is empty, and (empty) is the empty
set. The relationship between the words "null" and "empty" can only be found
by looking in an English dictionary. I propose that linguistic relationships
should never feature in names used in programming. Instead the relationships
should always be expressed as far as possible in the forms of the
identifiers as sequences of characters. Having a clear unified framework to
create names would also help not only the library author but also library
users. Thus I propose that (null) should actually have been called
"isEmpty", so that the relationship with the use of (empty) to denote the
empty set is immediately apparent, and the use of the word "is" would
immediately tell you that the function is a predicate.
It might even be advantageous to reserve more characters for use in
identifiers (since the infamous ASCII symbols are so abhorrent anyway ;-) )
so we could have a similar rule to Scheme, that predicates would end with a
question mark thus relieving us of the need to decide between "is" and "has"
(to try and eliminate as much of the messiness and indecision caused by
natural language as possible), though of course this would be a more long
term idea eg:
-- So related things appear alphabetically together...
empty? :: Set a -> Bool
empty :: Set a
It could be argued that it would be more in keeping with left-to-right
thinking to put the '?' first but then we'd lose the "related things should
be together in any alphabetical list of functions/values", though such a
compromise is already necessary when using "isEmpty" rather than "emptyIs"
which would perhaps just sound too unnatural ;-)
Moving on to Data.List, we find a confusion of different spatial, temporal,
and historical viewpoints jostling valiantly for supremacy in the
programmer's mind:
head tail -- funny cartoon-like image of a list
last init -- temporal
foldl foldr -- spatial (left to right)
Despite the fact that (last, init) is somehow the dual of (head, tail), we
have to switch from a temporal conception of a list to a historical
"cartoon" conception to move between them. Then we come to foldl, foldr
where we think in terms of left and right.
Also, with foldl, foldr, it is just lucky that we already know there is a
word called "fold" so that we know the trailing "r" or "l" is intended to be
a suffix - it's not nearly so clear in cases like "reducer" and "reducel"
where "reducer" is also a single word in English.
Therefore I propose that for a list there should only be one underlying
concept, that of a spatial sequence going from left to right, and that all
functions should be named in accordance with this alone ie:
atL atLs -- similar to (x:xs)
atR atRs
foldL foldR
thus there would only be one concept (the spatial sequence) to grasp which
the programmer could then map internally onto another concept such as time
if required, without being forced by the names in Data.List to jump
confusingly between them.
A discussion of naming in Data.List would certainly be incomplete without a
mention of (nub). It must surely rank as one of the most peculiar
identifiers in the whole history of programming. The documentation
thankfully explains that it means "essence", but then goes on to say that it
is in fact just a function for removing duplicates from a list. The mind can
only convulse in the most tortuous configurations to try and reverse
engineer this strange correspondence. Would the simple name
(removeDuplicates) or even (asSet) not be much easier all round?
Related to lists, we of course have sequences and there, instead of cons and
snoc, we could just have pushL and pushR which are nice logical names that
don't conjure up images of candles and devils crawling out of mirrors etc
like the use of reversed spelling in "snoc".
In case the above seems overly critical of other people's hard work and good
intent, I'd like to confess that I myself am often tempted off the path when
it comes to the creation of identifiers involving adjectives eg:
type BlueCar = ...
type RedCar = ...
The problem here is that these names, presumably both to do with "Car", are
not going to appear next to each other in any alphabetical listing (if there
are other names too), whereas:
type CarBlue = ...
type CarRed = ...
will. Thus the position of the adjective in natural language (in this case
English) has to be ignored if you want a programming environment to display
related things together.
Finally, as a piece de resistance :-), consider the following two
definitions of (>>=) for the continuation monad:
newtype Cont r a = Cont { runCont :: (a -> r) -> r }
m >>= k =
Cont $ \c -> runCont m $ \a -> runCont (k a) c -- (1)
Cont a_r_r >>= a_C_b_r_r =
Cont $ \ b_r ->
a_r_r (\a -> runCont (a_C_b_r_r a) b_r) -- (2)
In (1), we have to know what the associativity of ($) is (and this is a real
fire bed of controversy!) and the variable names, apart from (a), don't give
us the faintest clue what's going on, and in fact are doubly confusing
because (m) is used to represent a value of type (m a) so there is a kind of
kind error in the value name.
In contrast, (2) uses the simple convention that the type is reflected more
or less directly in the name so that a_r_r means a value of type (a->r)->r
(there is unfortunately no way to indicate bracketing in the name). Thus by
reading the definition we can understand what's going on just by cancelling
out components of the name thus: a_C_b_r_r a gives us C_b_r_r and runCont
(C_b_r_r) gives us b_r_r and (b_r_r b_r) gives us (r) so it's obvious that
we're supplying an argument of type a->r to a_r_r as required.
A great thing about Haskell is that it's quite simple to start renaming
things already without having to change existing modules. In my own code
I've simply just created new modules for IORef and Unique that re-export the
original module contents but with the unsuffixed names, then I import from a
hierarchy optimistically called "Prime" instead of the normal hierarchy :-)
Having said all this, it's nevertheless an open question to me, regarding
the meeting between human psychology and the desire for consistent logical
naming conventions, if perhaps the rapid jumping between different
viewpoints encountered in the functions in Data.List etc has a stimulating
effect on the imagination in terms of encouraging mobility of thought and a
danger with fully rationalised names could be that they may make one dry and
brittle - though at least there would be no danger of a headless fire
breathing dragon emerging from a mirror because you've just typed "cons"
backwards.... :-)
Regards, Brian.
[1] http://haskell.org/hawiki/UsingQualifiedNames (bottom of the page)
--
Logic empowers us and Love gives us purpose.
Yet still phantoms restless for eras long past,
congealed in the present in unthought forms,
strive mightily unseen to destroy us.
http://www.metamilk.com
More information about the Haskell-Cafe
mailing list