Names for small functions: just say no... Re: Data.List.join

Wed Nov 15 13:52:38 EST 2006

I simply don't have the stamina to follow up to all the
objections to my messages. I'm posting this here in the
thread because it's a convenient point, not because Robert's
message troubles me particularly.

Evidently I'm not getting my point across, so I'll give one
more try and then call it a day (for ten years or so, then I
can look again and see if the wind has changed).

1. I don't want to remove any extant library functions --
   these have already been subsumed into the collective
   consciousness.

2. I'm not opposed to intercalate in particular, it's just
   an instance of what I perceive to be a growing problem.

3. I'm not going to be strongly opposed to any particular
   suggested small function -- each usually has the merit of
   reducing the number of tokens in the code, which with all
   else being equal would be a good thing. All else is not
   equal, though. I was going to post something more in the
   thread on adding on, in that ultimately “<comparison>
   `on` fst” is more readable than “equating”, “comparing”
   and friends, simply because of the quantity of names. But
   even of those I cannot get really excited in my
   denouncement.

I don't know for certain about anyone else, but I for one
have a limited capacity for learning arbitrary names.  I'm
pretty sure there's a limit for most people, but for some
the limit is so large that learning all the names in all the
Haskell libraries that there will be in the future won't be
a problem. Such folk aren't going to be inconvenienced in
the long run, but for feeble minded people such as myself,
the application of a brake to the proliferation of names
would be an important gain.

To draw a parallel, for most of my life I've been
intermittently trying to learn Chinese characters. There are
so many, and although there's a degree of compositionality
from radicals to whole characters, a great deal of the
relationship between any complex character and its meaning
is arbitrary. Consequently I've yet to learn enough that I
can read a single sentence in Chinese. Six years ago, I had
a Russian lodger and thought it would be fun to learn a bit
of Russian.  There's only a few more characters in Russian
than English, so I learnt them all in a couple of days (not
the order, though: that's arbitrary). After that, I found
that Russian words are often composed of prefixes that
modify the meaning of smaller words¹. Because of this, even
without trying at all hard I can already read quite a bit of
Russian.

The difference is that with Chinese characters there's a
whole lot of arbitrary symbols to learn before you can get
anywhere, while with Russian there seems to be fewer
arbitrary symbols at each level. (This isn't just me, by the
way: Chinese characters are a significant obstacle to
literacy in China -- it's even possible that was originally
deliberate -- so they use pinyin) What I'm advocating is
that when deciding whether to put something into a library
we make sure that it's worth the extra effort to the reader,
so that reading Haskell doesn't become as hard as reading
Chinese -- only as hard as Russian ;-).

To twist Gauss's snippy remark to Wilson into a rule, what I
want is no new notations without new notions. If you can
name something without controversy so that the ordinary
English (or mathematical) reading both tells the reader
immediately what the function does and is obvious to the
writer looking for it (the latter is the lesser), then it
can go in a library. Otherwise we have to consider: is there
a more powerful function that does the job? is it possible
to define the same function at a higher type? is the name
going to be more useful at a different meaning? is the named
version /really/ more readable than the expression it stands
for? And it's not always going to be possible to answer
those questions without considerable work.

Particularly the last case: it's quite hard, especially for
a beginning (or intermediate) programmer, to escape the idea
that by naming a short bit of code one has made it simpler.
Part of the trouble is that, having had to think about the
thing long enough to want to name it, the name becomes (for
that programmer) a shorhand for the entire process of
discovering the combination. Ten years later, the discovery
doesn't seem so radical, reading the short bit of code is
straightforward and remembering what the name means has
become difficult.

On 2006-11-10 at 12:51EST Robert Dockins wrote:
> The Haskell standard libraries are not, and have never
> been, a minimal basis lacking redundancy.

I'm not sure how you got the idea that that's what I want.
There are several reasons that can make naming something
short worthwhile. Take “sum”. It's code is short, but there
are decisions to be made (foldl or foldr &c) that mean that
the name abstracts something useful away from the code.  The
things I've been objecting to are pretty much the only way
of writing something in terms of predefined functions.

>  Are you also opposed to the existence of
> 
> mapM f = sequence . map f

That's an interesting case.  I'm not, because of point (1)
above, but I reserve the right to whine about it.  Having a
naming convention is better than having completely arbitrary
names, but the fact that a convention was needed should
raise suspicion.  Given that we had concatMap already,
sequenceMap would have been a more easily interpreted name
-- and then some people would wonder, why bother with the
name?  It took me a while to realise that mapM isn't some
form of [f]map, and longer still to notice that fmap for
Monads is inexplicably called liftM.

> and other such useful goodies? I use mapM because it
> eliminates parens

You save one pair. 

> and reduces line length

by a small constant

> for a programming pattern I use a
> lot.  This increases redability of my code: win.

Yes, but there's a reduction in readability too, though it's
harder to notice once you've learned the name mapM, and so
long as the number of names like that in libraries is small
the loss of readability in this way is negligible. But if we
go on adding such names, it will become a real problem.

> Codifying this pattern by placing it in the standard
> libraries means that most people use the same name for the
> same concept, increasing overall readability: win.

On the other hand, if there were no names for it and people
wrote the same thing in each case, it would be just as
readable. (Do I need to repeat that point (1) makes this
specific case moot?).

We need names for medium to large concepts, not for really
small ones.

> Programmers find this concept useful and they name it.
> This has been done independently by multiple people
> persuing diverse ends.  The fact that this function isn't
> in the standard lib means that programmers name it
> different things and, in the long run, this harms
> readability across Haskell code.

Well, for the specific case of intercalate, the data Joseph
presented doesn't support that -- there were lots of
instances of the code written out, but only a few where this
was the body of a definition, and some of the definitions
were for a specific separator.

Finding a concept useful and naming it is something that
programmers do, but it's far from clear to me that it's
always what they should do. I can't present a real life
case, but it seems to me that there will be times when it
obscures what's going on.  For example suppose “weeble
. sort” occurs often and is given the name “foo” and that
“reverse . whiffle” has similarly been named “bar”.  Now, a
programmer who writes “foo . bar” might be quite happy that
this does the right thing and fail to notice that “weeble
. sort . reverse . whiffle” simplifies to “weeble . sort
. whiffle” (given appropriate conditions). This is
particularly likely when a library function has been found
without looking at the code.

Nor do I think that programmers naming things differently is
/necessarily/ a loss, either. Sometimes the name says
something about the intentional meaning that use of a name
from a library would not. Finally, we have to question how
far one has to search to find the definition of a name.
There are (a) names that one knows already (but there is a
limit to how many of these there can be), (b) names defined
in the module one is currently reading, and (c) unfamiliar
names defined elsewhere that one has to look up.  That is, I
think, a list in order of difficulty, and the more names
there are in libraries, the more times what was hoped to be
an (a) becomes a (c).

  Jón

[1] eg опасность means danger, безопасность is
without+danger = safety, небезопасность is
not-without-danger = insecurity. For some reason I find this
amusing.

-- 
Jón Fairbairn                              Jon.Fairbairn at cl.cam.ac.uk