[GHC] #12241: Surprising constructor accumulation

GHC ghc-devs at haskell.org
Wed Jun 29 01:49:55 UTC 2016


#12241: Surprising constructor accumulation
-------------------------------------+-------------------------------------
           Reporter:  dfeuer         |             Owner:
               Type:  bug            |            Status:  new
           Priority:  normal         |         Milestone:
          Component:  Runtime        |           Version:  7.10.3
  System                             |
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  Runtime
  Unknown/Multiple                   |  performance bug
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 `containers` version 0.5.7.1 (and a few earlier versions) uses the
 following implementation of `fromList` by Ross Paterson:

 {{{#!hs
 fromList        :: [a] -> Seq a
 fromList = Seq . mkTree 1 . map_elem
   where
     {-# SPECIALIZE mkTree :: Int -> [Elem a] -> FingerTree (Elem a) #-}
     {-# SPECIALIZE mkTree :: Int -> [Node a] -> FingerTree (Node a) #-}
     mkTree :: (Sized a) => Int -> [a] -> FingerTree a
     mkTree !_ [] = EmptyT
     mkTree _ [x1] = Single x1
     mkTree s [x1, x2] = Deep (2*s) (One x1) EmptyT (One x2)
     mkTree s [x1, x2, x3] = Deep (3*s) (One x1) EmptyT (Two x2 x3)
     mkTree s (x1:x2:x3:x4:xs) = case getNodes (3*s) x4 xs of
       (ns, sf) -> case mkTree (3*s) ns of
         !m -> Deep (3*size x1 + size m + size sf) (Three x1 x2 x3) m sf

     getNodes :: Int -> a -> [a] -> ([Node a], Digit a)
     getNodes !_ x1 [] = ([], One x1)
     getNodes _ x1 [x2] = ([], Two x1 x2)
     getNodes _ x1 [x2, x3] = ([], Three x1 x2 x3)
     getNodes s x1 (x2:x3:x4:xs) = (Node3 s x1 x2 x3:ns, d)
        where (ns, d) = getNodes s x4 xs

     map_elem :: [a] -> [Elem a]
 #if __GLASGOW_HASKELL__ >= 708
     map_elem xs = coerce xs
 #else
     map_elem xs = Data.List.map Elem xs
 #endif
     {-# INLINE map_elem #-}
 }}}

 This uses one lazy list per "level" in the tree being constructed. I
 believe Paterson (and pretty much everyone else) expected that there would
 be `O(log n)` pair constructors and conses live at any given time.
 Wadler's technique in
 [http://homepages.inf.ed.ac.uk/wadler/papers/leak/leak.ps.gz Fixing some
 space leaks with a garbage collector], which the GHC commentary indicates
 is used in GHC, should clean up the pairs in `getNodes`'s `d` thunks as
 they reach WHNF.

 Lennart Spitzner dug into the unimpressive performance of the above code
 and using

 {{{#!hs
 main = evaluate $ S.fromList [(0::Int)..999999]
 }}}

 produced
 [http://heap.ezyang.com/view/72d4d1eb879a2085ffd49d270b03c7a037b4d5c2 this
 heap profile]. If I'm reading it right, this suggests that there are lots
 of `(,)` and also `(:)` constructors live, more `O(n)` than `O(log n)`.

 I had previously found that I could improve performance by building the
 intermediate lists strictly, but that violates the generational hypothesis
 and leads to a slow-down for very large arguments
 ([http://heap.ezyang.com/view/2e80598f73d9281f5eadfeb041f5b9aef6e448b0
 Spitzner's heap profile]). Spitzner was able to come up with a very clever
 (but much trickier) implementation that skirted all these problems
 ([http://heap.ezyang.com/view/7f1994c0417931360fe7c41bd995c37ebc3fd6c5
 profile]) and avoids ever allocating the troublesome pairs.

 So the problem is thoroughly bypassed for `containers`, but it seems like
 something is not quite right here, and it might bear looking into.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12241>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list