[Haskell-cafe] Re: Allowing hyphens in identifiers

Wed Dec 16 01:02:10 EST 2009

Am Dienstag 15 Dezember 2009 03:04:43 schrieb Richard O'Keefe:
> On Dec 14, 2009, at 5:11 PM, Daniel Fischer wrote:
> > 1. I wasn't playing in the under_score vs. camelCase game, just
> > proposing a possible
> > reason why the camelCase may have been chosen for Haskell's standard
> > libraries.
>
> But the insanely abbreviated example did not provide such a reason.

Of course not. But if you expand it - and it's not difficult, even when insanely 
abbreviated -, the resulting sentence gives a possible reason:
"Maybe it's because the underscore style is considered far uglier and less readable by 
others."
If the early Haskellers felt that way, isn't it perfectly natural that they chose the 
camelCase style?
Of course, they may have had entirely different reasons, or no concrete reason at all and 
it just happened.

> You still haven't explained what the reason is supposed to be:  it
> can't be that baStudlyCase salvages the readability of abbreviation
> because it doesn't.

No, nothing salvages insane abbreviation.

> Indeed, it makes it worse, because you can't
> always tell where one abbreviation ends and another begins.
>
> In teaching an information retrieval paper, one of my favourite examples
> is unionised.  Does it mean
> 	(union+ise)+ed	"having had the workers organised into a union"
> or	un+(ion+ised)   "not having had its molecules turned into ions".
> When I mean the latter, I always write un-ionised.

Yes, some words are ambiguous. Although I too find un-ionised superior, unIonised also 
works - okay in a serif font, just barely in sans-serif.

>
> Now consider an actual Java class name,
> where I genuinely didn't know what the answer was.
> 	INSURL
> baStudlyCaps style for Java doesn't allow underscores in class names.
> (This is actual Sun code.)  Is this something to do with insurance?
> Is this something to do with URLs for the US Immigration and
> Nationalization Service?  Are Inertial Navigation Systems involved?
> Is the mention of 'URL' anything to do with URLs, or should this be
> parsed something like (I) (NSU) (RL)?  With underscores, the actual
> parsing, INS_URL, would be unambiguous.
>
> Or take NVList, another real name.  Is it an NV_List (where I don't
> know what NV is), an N_V_List (where I don't know what N and V are),
> or an N_VList (where I do know what a vlist is).  In fact it's a
> Name_Value_List.  I _might_ have had a clue with N_V_List...

Do you agree that both are horrible names regardless of whether one uses camel case or 
underscores? INSURL even worse than NVList? Although, considering that it's in 
"package com.sun.corba.se.impl.naming.namingutil"
(not that I find that package name particularly well chosen), the URL part is pretty 
evident. Nevertheless, ugh, and indeed, INS_URL would be better. Still better would be a 
name at least partially resolving INS (it's not the International Necronautical Society, I 
hope, that would be beastly to expand).
Considering the other, NamedValueList or Named_Value_List would be far better than any 
sprinkling of underscores over NVList.

>
> My point here is that if you separate words with spaces, dots,
> hyphens, underscores, backslashes, or almost anything, you are going
> to have _much_ less trouble with abbreviations than if you just jam
> them together baStudlyCaps-style.

Agreed, that kind of shitty names are less bad with underscores.

>
> As for my "parody" of baStudlyCaps, thatIsExactlyHowItLooksToMe.

It's not how it looks to me.

>
> > I think you could find that written in many texts on aesthetic
> > relativism.
>
> They are empirically wrong.
>

Interesting. I thought the question whether there is an objective quality of beauty was 
metaphysical and thus it's impossible to be empirically right or wrong in that respect.
So, pray tell, what is the physical substrate of beauty, how can one measure it?

> > Both are judgments based on their respective preferences and nothing
> > else
>
> I disagree.  Sometimes, people can articulate _why_ they like or dislike
> things.  For example, I like anything spacious and bright.  This
> explains
> very well why I prefer landscapes (spacious) to portraits (not
> spacious).

Yes, it does. But it doesn't mean that landscapes are objectively beautiful and portraits 
are objectively ugly. There are people who prefer portraits to landscapes. They are not 
wrong, they just have different preferences.

> When it comes to depictions of plants, animals, people, and so on, I
> prefer healthy to unhealthy, friendly appearance to hostile/dangerous.

Yup. Although my favourite is Toulouse-Lautrec and I like Brueghel (P. the elder) and 
Bosch too, in general I prefer nice and friendly.

> Given that, you could probably predict my response to most paintings
> fairly well.  If I and anyone I personally knew disagreed about which of
> two paintings was "better",

Are we here at the core of a misunderstanding?
I'm not talking about "better", only about "beautiful".

> I would expect to find that we quickly reached
> agreement about what features were _present_ to what _degree_ and about
> the technical standard of the work (on a rather coarse scale, but
> enough).
> The differences could be explained by the relative _weights_ we gave to
> the various features.  Just as I have learned how to prepare tea and
> cook
> onions so that my wife will enjoy them, although I dislike the one and
> detest the other, so I would expect to be able to learn how to predict
> someone's aesthetic taste fairly well.

How can you not love tea?

>
> Maybe we do agree.  It wasn't clear whether by "preferences" you meant
> "weights" or "outcomes".  The thing is, if "preferences" means
> "outcomes",
> there's no reason to expect that people will ever agree, whereas if it
> means "weights", then it should be possible to find or construct
> examples
> differing in a single feature where two people with different weights
> will agree on which is better.

By (aesthetic) preferences I mean "what you like", "what you enjoy to look at or listen 
to". Whether you find forests beautiful or green prairies or the high mountains.

>
> In the same way, when it comes to coding style, it may well be that
> we are responding to the same objective properties of styles, but
> weighting them differently.  It appears, for example, that we both
> perceive abbreviation, and we both give it a negative weight.

Not unconditionally, of course. parseURI is a good name, I think you will agree that it's 
better than parseUniformResourceIdentifier or parse_uniform_resource_identifier. And of 
course, parse_URI is also better than both in my opinion.
enum for enumeration/enumerate - rather good than bad, at least in compositions
num, val - why not

Abbreviation is a little like goto: handle with care

> It is therefore to be expected that given two versions of a program in which
> the *only* thing changed is the degree and/or nature of abbreviation,
> we'll agree which is better and which is worse.

Probably.

>
> For me to accept "personal preference" as a final explanation of
> something would be to accept an end to rational investigation.
>

Yes, in a way. You can of course investigate how people came to their preferences.
When it comes to whether somebody prefers Shostakovich or Mozart, Joy Division or Shakira, 
Duck or Lobster, you can investigate why, but that's it, in the end, people have their 
preferences. When they enjoy reading Jane Austen but dislike reading Thomas Mann, it's not 
because Austen was a good writer and Mann was a bad writer (both were really good), it's 
just because Austen is more to their taste.

> >> If it were just a matter of experience, then this experience should
> >> surely have taught me to love baStudlyCaps.
> >
> > No. It should have tought you to *read* camelCase - unless your
> > aversion is so strong that
> > you actively refuse to learn reading it.
>
> Where did you ever get the idea that I can't *read* baStudlyCaps?

Nowhere. I was convinced from the beginning that you can read camel case - with greater 
ease than you're willing to admit.

> Just because I can read it doesn't mean that I can't read something
> else *better*.

Sure.

> Life is hard enough without accepting unnecessary
> difficulties, even if they are moderately small ones.
>
> >>> Sourcecode is so different from ordinary text (a line of sourcecode
> >>> rarely
> >>> contains more than four or five words),
>
> I gave the wrong response to that yesterday.  Later in the common room
> I realised what the perfect answer is:
>
> 	newspapers are ordinary text,
> 	newspaper columns are typically four or five words across.

Then your newspapers have narrower columns than the ones I read. Four or five words 
happen, but only if at least one of them is a monster like Anlagenmechanikermeister (yes, 
that's a real word, I found it when counting today). Typically, I counted six to eight, 
ten or twelve occur too. About 40 characters, more when many i's, t's, l's and f's (and 
few m's, w's) appear.

>
> The number of words per line is therefore not a useful way to
> distinguish source code from ordinary text.
>

How about: Source code is (usually) few words embedded in lots of whitespace. Mostly short 
lines and short paragraphs. Ordinary text is often a massive block of ink with a little 
whitespace embedded.
Yes, the latter is more a characterisation of newspaper columns, books have more 
whitespace and are easier on the eye. But books have long lines with many words on them.
Source code normally comes in much smaller chunks than ordinary text.

> >> baStudlyCaps doesn't read any better with short lines.
> >
> > I have no trouble reading either version. And that although this is
> > not what camelCase is
> > intended for (as far as I know, the purpose of it is to mark word
> > boundaries within *one
> > token* [identifier]).
>
> You missed the point of the example, which was that those words were
> joined (either by underscores or baStudlyJunctions) which formed
> sensible units.  The junctions were not arbitrary.

Sensible units (although the collation of desire and increase irritates me), but distinct 
tokens nevertheless. takeWhile is one single token.

>
> [1]
>
> > So? Whitespace helps tokenising and thus increases readability (for
> > me, at least).
>
> [2]
>
> > What's the relation to the question whether camel case and
> > underscore are readable or not?
>
> In quotation [1], you concede the argument against baStudlyCaps.

No. We might look at source code in a different way. When I see sendSolutionsUsing, I see 
one token of the language. When I see

   this sendSolutionsUsing: Empty and: Empty to: that

, I see seven or ten tokens (depends on the staus of : in Smalltalk, it probably should be

    this sendSolutionsUsing : Empty and : Empty to : that

because a small thing like a colon is very effectively hidden by long words directly 
adjacent - and: and to: would be okay, but I like to be consistent locally).
When I see

    [this sendSolutionsUsing:Empty and:Empty to:that]

, I see somebody took a deliberate effort to make tokenising harder.

> If white space helps finding units of meaning and thus increases

Whitespace helps separating tokens. Units of meaning comes after that (syntax first, then 
semantics). They come finer grained (the words making up a compound identifier) and 
coarser grained (several tokens making up a subexpression, like (sortBy cmp) in 
sortBy cmp xs) than tokens.

> readability for you, then white-or-functionally-white space
> should help finding units of meaning in program text, and
> baStudlyCaps should be less readable than separated_words.
>
> The only way to have your cake and eat it is to deny that the
> words making up a compound identifier _are_ units of meaning
> that should be perceived as such, or at least the only way that
> I can see.  This seems an odd position to hold.
>

They are units of meaning, but subordinate to the units of syntax, the tokens.
For me, that is, you obviously read code differently.

> >> "Persaude a man against his will, he's of the same opinion still."
> >> How _much_ evidence?
> >
> > Replicated studies with enough participants from enough different
> > environments/cultures
> > showing that  more than 99% of the participants find it clearly more
> > readable.
>
> OK, there is no point in my continuing this.
> Such a level of study is not practically attainable.
>

Right. And I've been generous, because strictly speaking, a claim that something is 
objectively true, goes down the drain with one counterexample. So if every single human, 
with *one* exception found underscores more readable than camel case, it would not be 
objectively the case that underscores are more readable.

> > That's due to the *objectively*; for such a strong claim, you need
> > unusually strong evidence.
>
> This is NOT one of those extraordinary claims that require extraordinary
> evidence.

What? With the word 'objectively' in it, it is an extraordinary claim requiring 
extraordinary evidence.
Replace the word 'objectively' with 'generally', or 'for most', and it becomes an ordinary 
claim which can be established by ordinary evidence.

> It's an entirely humdrum claim that what makes ordinary text
> more readable makes something strongly resembling ordinary text more
> readable, and as such, perfectly ordinary experimental evidence should
> do.
>

However, I don't think source code strongly resembles ordinary text.
Therefore I would like to see some evidence before applying findings about ordinary text 
to source code. Perfectly ordinary evidence is enough, because what matters is 
'generally', not 'objectively'.

> > I take the widespread presence of both as an indication that the
> > majority isn't very
> > large, so you'd have a little work to do to convince me.
>
> You are making the assumption that the word separation style of
> programmers reflects their OWN initial preference.

No. I am making the assumption that the word separation style of a language reflects the 
preferences of the language designers and/or the authors of the standard libraries.
Further, I don't believe that preference of word separation style is strongly correlated 
with the capability of designing a programming language or implementing a standard library 
for a blossoming new language.
Thus I tentatively believe that the preferences of language designers do not strongly 
deviate from the preferences of general programmers.
Hence I would expect that a large majority of one preference produces a majority of 
languages with the preferred style.

> I am aware of no
> reason to believe that.  People writing Pascal (which didn't _have_
> an underscore because there wasn't one in the 6-bit character set it
> was designed for) or Smalltalk (which didn't _have_ an underscore
> because there wasn't one in the 7-bit character set it was designed
> for) simply didn't have a choice.

Didn't know that. So camel case had a head start.

> Java's designers seem to have
> fairly mindlessly copied Smalltalk, and Java's users _have_ to use

However, if they had a strong preference for underscores, they would probably not hav 
copied Smalltalk, would they?

> Java's vast range of predefined baStudlyCaps identifiers, so in
> effect have no choice.  (Unless like me they have a preprocessor.)
>
> I dare to say that we agree that Java has many advantages over some
> of its rivals, so that using an uncomfortable word separation style
> may be compensated for by something else (such as NetBeans or Eclipse,
> maybe).

Yes, we agree on that. However, for many people, Java's style is not uncomfortable.
They may be neutral on the matter, prefer camel case or have a slight preference for 
underscores.
I prefer my braces on the same line as if/while/for, I prefer four space indents (no 
tabs), but I have no problem with

if (condition)
{
  action();
}
else
{
  other_action();
}

>
> I'm quite sure that we agree that Haskell has *huge* advantages.

No doubt about that.

> The unfortunate word separation style offsets that enough that it's
> worth programming around (using a preprocessor, for example), but
> not enough to make me stop using it.
>
> >> You're asking me to sacrifice readability everywhere else
> >> for the sake of one line in every 2850?  (Not that I do
> >> find that line more readable in basStudlyCaps.)
> >
> > Not at all. What gave you that idea?
>
> The form of your argument.
>
???

> > You prefer to read and write code in underscore style. Others prefer
> > camel case.
> > Without an easy way to convert, at least one group won't be happy.
>
> But there *IS* an easy way to convert.
>
Yes, all the better.

> > If I can help improving it and making it more usable, I'd be happy
> > to (there are a couple of points where the transformation is not
> > trivial, {-# OPTIONS_GHC #-}, foreign import).
>
> Changes are not made inside {-...-} or "...", only to Haskell
> identifiers.

I hadn't looked at it before.
By the way, I love

hspp_nested_comment :: Int -> String -> String
 -- Int should be safe provided comments are not nested 2147483648
 -- levels deep.

:D :D :D

>
> There's one bug I'm aware of:  --<symbol> is treated as a comment.

Easy to fix.

hspp xs@('-':'-':_) =
    case span (== '-') xs of
        (dashes,more@(c:_))
            | isSymbol c -> case span isSymbol more of 
                                (sym,rest) -> dashes ++ sym ++ hspp rest 
            | otherwise  -> dashes ++ hspp_eoline_comment more
        _ -> xs

Now, would you be interested in a transformation the other way round, so that you can read 
other people's code in your preferred style?