[GHC] #9345: Data.List.inits is extremely slow

Thu Jul 24 20:32:35 UTC 2014

#9345: Data.List.inits is extremely slow
-------------------------------------+-------------------------------------
              Reporter:  dfeuer      |            Owner:
                  Type:  bug         |           Status:  new
              Priority:  normal      |        Milestone:  7.8.4
             Component:              |          Version:  7.8.3
  libraries/base                     |         Keywords:
            Resolution:              |     Architecture:  Unknown/Multiple
      Operating System:              |       Difficulty:  Easy (less than 1
  Unknown/Multiple                   |  hour)
       Type of failure:  Runtime     |       Blocked By:
  performance bug                    |  Related Tickets:
             Test Case:              |
              Blocking:              |
Differential Revisions:              |
-------------------------------------+-------------------------------------

Comment (by dfeuer):

 Replying to [comment:13 nomeata]:
 > If we are worried about `seq` we can simply inline my `myScanl'` in
 `initsQ2` and replace the `seq` by `case` statements. Or leave the `seq`
 but argue that we are always applying them to values of type `Queue`,
 nothing breaks. And by that argument, as long as `myScanl` is only used
 for `initsQ2`, we should be safe.

 I believe I have now pretty much completed the proof. It's very ugly,
 because I don't have a sufficiently firm understanding of the higher-order
 fold involved, but here's the outline (minus horrors of immature
 mathematics):

 {{{
 foldr evil wrong (scanl' f a bs)
 = foldr evil wrong $
     a `seq`
     a : foldr (\b g x -> let b' = f x b in b' `seq` (b` : g b'))
               (\b -> b `seq` [])
               bs
               a -- Call this e1 a bs

 =?= (\c n ->
     a `seq`
     a `c` foldr (\b g x -> let b' = f x b in b' `seq` (b' `c` g b'))
                 (\b -> b `seq` n)
                 bs
                 a) evil wrong  -- Call this e2 a bs

 }}}

 There are two trivial base cases:

 {{{#!haskell
 e1 [] = e2 [] = evil a wrong
 e1 _|_ = e2 _|_ = evil a _|_
 }}}

 Then the part I found difficult was proving that

 {{{#!haskell
 e1 a (q:bs) = evil a $ let b1' = f a q in b1' `seq` e1 b1' bs
 }}}

 to match the much simpler

 {{{#!haskell
 e2 a (q:bs) = evil a $ let b1' = f a q in b1' `seq` e2 b1' bs
 }}}

 For the infinite case, I don't know the formalism involved, but the
 general idea is that `evil` can only inspect a finite amount of its second
 argument before producing something, so we can always (temporarily) cut
 off the list somewhere past that.

 > I guess we only should worry about actually adding `scanl'` to the
 libraries, but that’s a different issue and should be discussed in #9356 I
 guess.

 Yes, but I now think we should.

 > We could also write `initsQ` in the completely inlined form, i.e. the
 core above. Plus point: No need to define the `Queue` datatype (which
 would be dead code anyways). Minus point: Doesn’t look elegant, no fusion
 possible.

 It doesn't look bad at all, but I think it's pretty much write-only
 code—it's hard to see what it's doing and why, and hard to experiment with
 as we've been doing (swapping modular pieces around to try different
 things). Giving up possible fusion to attain less clarity does not look
 like a good plan to me.

 I'm going to do some benchmarking and profiling myself; I'd like to toss a
 fusable non-strict scanl into the ring for comparison, and ramp up the
 scale on some of the benchmarks to improve resolution.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9345#comment:14>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler