Speeding up Data.List.inits
David Feuer
david.feuer at gmail.com
Sat Jul 19 07:51:53 UTC 2014
Summary: yes, we can, by a LOT. Yes, I know how. Yes, I've done some
benchmarking to demonstrate. Yes, it is even very simple. And yes, the
results are correct, including laziness requirements.
Background: I was looking at the code for Data.List.inits in base-4.7.0.0
(function renamed for clarity):
initsDL :: [a] -> [[a]]
initsDL xs = [] : case xs of
[] -> []
x : xs' -> map (x :) (initsDL xs')
The recursive maps made me suspicious. I decided to try writing a different
version:
initsHO xs = map ($ []) (scanl q id xs)
where
q f x = f . (x:)
I mentioned this on #haskell and noted that it would be a nice exercise to
write it with less fancy function footwork using map reverse. rasfar
responded (modulo naming) with
initsR xs = map reverse (scanl q [] xs)
where
q acc x = x : acc
rasfar ran a few informal benchmarks suggesting that initsR is faster than
initsHO, which in turn is significantly faster than the current
implementation in Data.List. I have now run Criterion benchmarks testing
performance in three ways: reduction of inits [1..n] to normal form,
reduction of (inits [1..n])!!(n-1) to normal form, and reduction of (length
(inits [1..n])) to normal form. In each case the Criterion test case is the
list [1..n], so there is no risk of some sort of weird fusion occurring.
The results are extremely clear in all three test scenarios, with a variety
of values of n: initsR is a little faster than initsHO, and inits HO is
much, much faster than initsDL. The differences become apparent even for
small values of n (10-50), but when n gets up to 100000, you don't even
want to wait for initsDL to finish. This was all using ghc 7.8.3 with -O2.
I've attached my benchmarking code, which also includes some basic
correctness and laziness tests. Note that the first group of benchmarks
tests a variety of vaues of n. For the second group, I was tired so I just
twiddled it by hand. For the third group... Criterion estimated that
benchmarking length(initsDL(100000)) would take 124582.9 seconds, so I
decided to stop there.
Conclusion: we should replace Data.List.inits with initsR, unless someone
comes up with something that beats initsR.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20140719/ee82485d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: inits.hs
Type: text/x-haskell
Size: 1900 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/libraries/attachments/20140719/ee82485d/attachment.hs>
More information about the Libraries
mailing list