[Haskell-cafe] Great language shootout: reloaded

Fri Nov 10 20:14:12 EST 2006

Sebastian Sylvan wrote:
> On 11/10/06, Henk-Jan van Tuyl <hjgtuyl at chello.nl> wrote:
>>
>> On Fri, 10 Nov 2006 01:44:15 +0100, Donald Bruce Stewart
>> <dons at cse.unsw.edu.au> wrote:
>>
>> > So back in January we had lots of fun tuning up Haskell code for the
>> > Great Language Shootout[1]. We did quite well at the time, at one 
>> point
>> > ranking overall first[2]. [...]
>>
>> Haskell suddenly dropped several places in the overall socre, when the
>> size measurement changed from line-count to number-of-bytes after
>> gzipping. Maybe it's worth it, to study why this is; Haskell programs 
>> are
>> often much more compact then programs in other languages, but after
>> gzipping, other languages do much better. One reason I can think of, is
>> that for very short programs, the import statements weigh heavily.
>
> I think the main factor is that languages with large syntactic
> redundancy get that compressed away. I.e if you write:
>
> MyVeryLongAndConvlutedClassName MyVeryLargeAndConvulutedObject new
> MyVeryLongAndConvolutedClassName( somOtherLongVariableName );
>
> Or something like that, that makes the code clumpsy and difficult to
> read, but it won't affect the gzipped byte count very much.
> Their current way of meassuring is pretty much pointless, since the
> main thing the gzipping does is remove the impact of clunky syntax.
> Meassuring lines of code is certainly not perfect, but IMO it's a lot
> more useful as a metric then gzipped bytes.
>
It may not be useful on its own, but it is not entirely meaningless. By 
using a lossless compression algorithm, you might infer some meaning 
about the code. Where it fails though is that if the algorithm was ideal 
(preferring low space at the expense of time), then the resulting bytes 
should be exactly the same. If it is not, then the samples did not do 
the exact same thing in the first place and so are not comparable! So, 
assuming gzip is ideal, then it is considered a win by having a higher 
compressed output!

It is not that the method is pointless, it is the extrapolation and 
interpretation of the results. You could argue that the gzipped output 
is just the same thing written in a new programming language - of 
course, it is not very readable (at least not to me since I do not have 
gunzip installed in my brain, but I do have a Haskell interpreter of 
some sort). Achieving minimum expressiveness at the source code level is 
entirely subjective and is based on an interpretation by the observer. 
Using gzip attempts to minimise this subjectivity - whether or not it is 
successful is not entirely decidable, but it is at least better. 
Unfortunately, the results have been misinterpreted.

Just smile and nod, I do :)