Fwd: Proposal (breaking change, but probably not one that will break any real code): strictify genericLength

Sun Aug 3 09:00:05 UTC 2014

Since Krzysztof made their replies public, I figured I'd add this
reply to the mix (an expanded/improved version of a message I sent
them). I think these things are worth considering however things turn
out. If, for the sake of stability, genericLength has to stay (pretty
much) the same, it probably wouldn't hurt to add a genericLength' or
some such.

As for the current genericLength, it is not a panacea even for lazy naturals.

I started by noticing that the same result is achieved by the slightly
nicer definition

genericLength = foldr (\_ y = 1 + y)

However, just looking at the definition more closely than I had
reveals an efficiency problem even here. Specifically, this definition
assumes that a + b   is defined something like

Zero + b = b
Succ a + b = a + Succ b

It would, however, be perfectly legitimate to do it the other way around:

a + Zero = a
a + Succ b = Succ a + b

In the former case, 1+y is O(1) and y+1 is O(y), whereas the opposite
is true in the latter case!

The situation for the strict version has some subtleties too. One thing
that's not subtle is that the strict version should be written using
foldl' (at least in most cases), which should allow it to take
advantage of the new call arity
analysis and resulting optimization. As for the nitty-gritty, there
are several cases to think about:

1. Int8, Int16, Word8, and Word16, where it is advantageous to
calculate the length as an Int (or perhaps a Word) and then narrow the
result—I'm not sure if GHC will do this right automatically in this
case (and the answer could depend on whether -fllvm is used or not),
but it would be best to do some benchmarking before concluding that
it's okay to ignore.
2. Int32, Word32, Int64, and Word64, which will generally work well
calculated directly, although it's possible that the latter two might
be done better some other way on 32-bit machines.
3. Integer: it should work to calculate that as a Word64 and convert
it, but then again if we wait 30 years that might not be true anymore,
maybe.
4. Floating point: The current definition, as well as the one you
propose, seem wrong for
floating point. I would expect the floating point length of a
sufficiently long list to be the closest approximation from the Word64
length, but if you tried to find the genericLength of a list with
length 2^24 as a Float with these definitions, you will have a
problem: they will actually get stuck at the largest exactly
representable integer and go no further.