[GHC] #9345: Data.List.inits is extremely slow
GHC
ghc-devs at haskell.org
Thu Jul 24 20:32:35 UTC 2014
#9345: Data.List.inits is extremely slow
-------------------------------------+-------------------------------------
Reporter: dfeuer | Owner:
Type: bug | Status: new
Priority: normal | Milestone: 7.8.4
Component: | Version: 7.8.3
libraries/base | Keywords:
Resolution: | Architecture: Unknown/Multiple
Operating System: | Difficulty: Easy (less than 1
Unknown/Multiple | hour)
Type of failure: Runtime | Blocked By:
performance bug | Related Tickets:
Test Case: |
Blocking: |
Differential Revisions: |
-------------------------------------+-------------------------------------
Comment (by dfeuer):
Replying to [comment:13 nomeata]:
> If we are worried about `seq` we can simply inline my `myScanl'` in
`initsQ2` and replace the `seq` by `case` statements. Or leave the `seq`
but argue that we are always applying them to values of type `Queue`,
nothing breaks. And by that argument, as long as `myScanl` is only used
for `initsQ2`, we should be safe.
I believe I have now pretty much completed the proof. It's very ugly,
because I don't have a sufficiently firm understanding of the higher-order
fold involved, but here's the outline (minus horrors of immature
mathematics):
{{{
foldr evil wrong (scanl' f a bs)
= foldr evil wrong $
a `seq`
a : foldr (\b g x -> let b' = f x b in b' `seq` (b` : g b'))
(\b -> b `seq` [])
bs
a -- Call this e1 a bs
=?= (\c n ->
a `seq`
a `c` foldr (\b g x -> let b' = f x b in b' `seq` (b' `c` g b'))
(\b -> b `seq` n)
bs
a) evil wrong -- Call this e2 a bs
}}}
There are two trivial base cases:
{{{#!haskell
e1 [] = e2 [] = evil a wrong
e1 _|_ = e2 _|_ = evil a _|_
}}}
Then the part I found difficult was proving that
{{{#!haskell
e1 a (q:bs) = evil a $ let b1' = f a q in b1' `seq` e1 b1' bs
}}}
to match the much simpler
{{{#!haskell
e2 a (q:bs) = evil a $ let b1' = f a q in b1' `seq` e2 b1' bs
}}}
For the infinite case, I don't know the formalism involved, but the
general idea is that `evil` can only inspect a finite amount of its second
argument before producing something, so we can always (temporarily) cut
off the list somewhere past that.
> I guess we only should worry about actually adding `scanl'` to the
libraries, but that’s a different issue and should be discussed in #9356 I
guess.
Yes, but I now think we should.
> We could also write `initsQ` in the completely inlined form, i.e. the
core above. Plus point: No need to define the `Queue` datatype (which
would be dead code anyways). Minus point: Doesn’t look elegant, no fusion
possible.
It doesn't look bad at all, but I think it's pretty much write-only
code—it's hard to see what it's doing and why, and hard to experiment with
as we've been doing (swapping modular pieces around to try different
things). Giving up possible fusion to attain less clarity does not look
like a good plan to me.
I'm going to do some benchmarking and profiling myself; I'd like to toss a
fusable non-strict scanl into the ring for comparison, and ramp up the
scale on some of the benchmarks to improve resolution.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9345#comment:14>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list