# Speeding up Data.List.inits

David Feuer david.feuer at gmail.com
Sat Jul 19 07:51:53 UTC 2014

```Summary: yes, we can, by a LOT. Yes, I know how. Yes, I've done some
benchmarking to demonstrate. Yes, it is even very simple. And yes, the
results are correct, including laziness requirements.

Background: I was looking at the code for Data.List.inits in base-4.7.0.0
(function renamed for clarity):

initsDL                   :: [a] -> [[a]]
initsDL xs                =  [] : case xs of
[]      -> []
x : xs' -> map (x :) (initsDL xs')

The recursive maps made me suspicious. I decided to try writing a different
version:

initsHO xs = map (\$ []) (scanl q id xs)
where
q f x = f . (x:)

I mentioned this on #haskell and noted that it would be a nice exercise to
write it with less fancy function footwork using map reverse. rasfar
responded (modulo naming) with

initsR xs = map reverse (scanl q [] xs)
where
q acc x = x : acc

rasfar ran a few informal benchmarks suggesting that initsR is faster than
initsHO, which in turn is significantly faster than the current
implementation in Data.List. I have now run Criterion benchmarks testing
performance in three ways: reduction of inits [1..n] to normal form,
reduction of (inits [1..n])!!(n-1) to normal form, and reduction of (length
(inits [1..n])) to normal form. In each case the Criterion test case is the
list [1..n], so there is no risk of some sort of weird fusion occurring.
The results are extremely clear in all three test scenarios, with a variety
of values of n:  initsR is a little faster than initsHO, and inits HO is
much, much faster than initsDL. The differences become apparent even for
small values of n (10-50), but when n gets up to 100000, you don't even
want to wait for initsDL to finish. This was all using ghc 7.8.3 with -O2.
I've attached my benchmarking code, which also includes some basic
correctness and laziness tests. Note that the first group of benchmarks
tests a variety of vaues of n. For the second group, I was tired so I just
twiddled it by hand. For the third group... Criterion estimated that
benchmarking length(initsDL(100000)) would take 124582.9 seconds, so I
decided to stop there.

Conclusion: we should replace Data.List.inits with initsR, unless someone
comes up with something that beats initsR.
-------------- next part --------------
An HTML attachment was scrubbed...