Formal proposal: replace Data.List.inits
David Feuer
david.feuer at gmail.com
Sun Jul 20 07:58:16 UTC 2014
Minor errors in copying/memory:
initsHO = map ($[]) . scanl q id
where
q f x = f . (x :)
On Sun, Jul 20, 2014 at 2:03 AM, David Feuer <david.feuer at gmail.com> wrote:
> As previously discussed, Data.List.inits has terrible performance:
> something as simple as
>
> print $ length $ inits [100000]
>
> will take an absurdly long time. The simplest decent improvement seems to be
>
> initsHO = map ($[]) . scanl q []
> where
> q f x = f . (q :)
>
> A slight transformation of this by Andrew Seniuk (rasfar on #haskell)
> gives the slightly faster
>
> initsR = map reverse . scanl (flip (:)) []
>
> As Henning Thielemann and Edward Kmett pointed out, these perform
> poorly when calculating things like `map head $ drop 100000 $ inits
> [100050]` because the costs of the reverses or compositions are
> monolithic and unshared. Edward Kmett pointed out that this could
> probably be fixed by using a queue (although he isn't sure it's worth
> doing so). As it turns out, it can, and with only a trivial penalty in
> other cases. The following uses a stripped-down partial implementation
> of Okasaki's banker's queue (see the attached file for a version with
> explanatory comments as well as the benchmark code):
>
> import Data.Bits (popCount)
> import Data.Word (Word)
>
> data Queue a = Queue {-# UNPACK #-} !Word [a] [a]
>
> queue :: Word -> [a] -> [a] -> Queue a
> queue lp1 f r
> | popCount lp1 /= 1 = Queue lp1 f r
> | otherwise = Queue lp1 (f ++ reverse r) []
>
> emptyQ :: Queue a
> emptyQ = Queue 1 [] []
>
> snocQ :: Queue a -> a -> Queue a
> snocQ (Queue lp1 f r) x = queue (lp1 + 1) f (x:r)
>
> toListQ :: Queue a -> [a]
> toListQ (Queue _ f r) = f ++ reverse r
>
> initsQ :: [a] -> [[a]]
> initsQ = map toListQ . scanl snocQ emptyQ
>
>
> This implementation appears to take care of the performance wrinkle in
> initsR and initsHO, while being only very slightly slower in other
> contexts. It's longer, certainly, but it's not long.
>
> I have attached the Criterion report produced by the attached program
> comparing initsR to initsQ. Previous tests definitively showed that
> initsHO is a small constant factor worse than initsR, and
> Data.List.inits is incomparably worse, so I limited this round to
> initsR and initsQ.
>
> I therefore propose that we should replace the current implementation
> of inits with initsQ, the queue-based one, unless someone comes up
> with something faster and/or simpler very soon.
More information about the Libraries
mailing list