[Haskell-cafe] Re: ANNOUNCE: Utrecht Haskell Compiler (UHC) -- first release

Tue Apr 21 03:39:15 EDT 2009

On Mon, Apr 20, 2009 at 10:21 PM, Richard O'Keefe <ok at cs.otago.ac.nz> wrote:
>
> On 21 Apr 2009, at 5:10 pm, Jason Dagit wrote:
>>
>> Plus, there was a movement to ban them:
>
> And somehow this means people don't?

...see the humor.

>>
>> BUT, here is the real point of my reply:
>>
>> To end this debate as to whether people really use them.  We have this
>> huge collection of source code called Hackage.  I bet that if someone
>> with haskell-src-ext experience sat down they could go through all of
>> package in an automated way and count the number of uses of n+k
>> patterns in source code that appears in the wild.
>
> I'm sorry, that wouldn't even come *close* to answering the question.
> It's a good way to demonstrate that people *are* using some feature
> (like hierarchical package names), but an incredibly bad way to show
> that they aren't.

Not really.  Obviously some programs use the feature, but let us
restrict to interesting programs that have been shared with the world
and have some potential to receive maintenance.  From these programs
we can do a sampling.  While I'm not a statistics expert, my
understanding is the main problem with using hackage packages is a bit
of selection bias.  I bet the selection bias isn't even that bad for
this statistical test due to the nature of programming style
diversity.  Maybe someone with a stronger stats background could
comment.

> If every Haskell user contributed to Hackage, and if
> every contributer to Hackage contributed all the code they wrote,
> then it would make sense.

I think that would give us an exhaustive collection of haskell code,
but I assert we don't need that.  Biologists don't need a DNA sample
from every organism to draw conclusions about the genetics of a
species.  Scientists work with incomplete data and draw sound
conclusions in spite of that.  The tools they use to do so are known
as statistics.

> In the Erlang mailing list, I frequently use the technique of
> trawling through publically available Erlang sources to demonstrate
> that features people claim are rare are not.  But I'd never be
> silly enough to claim on the basis of such a scan that some feature
> _wasn't_ being used extensively in other sources.

Okay, then prove n+k patterns are not rare in the publicly available
sources.  That's the challenge I was trying to make in my first email.
 My apology for not being more direct in the asking.

Jason