[GHC] #7561: Unnecessary Heap Allocations - Slow Performance

Sat Jan 19 10:38:15 CET 2013

#7561: Unnecessary Heap Allocations - Slow Performance
-------------------------------+--------------------------------------------
    Reporter:  wurmli          |       Owner:                         
        Type:  bug             |      Status:  new                    
    Priority:  normal          |   Milestone:  7.8.1                  
   Component:  Compiler        |     Version:  7.6.1                  
    Keywords:                  |          Os:  Linux                  
Architecture:  x86_64 (amd64)  |     Failure:  Runtime performance bug
  Difficulty:  Unknown         |    Testcase:                         
   Blockedby:                  |    Blocking:                         
     Related:                  |  
-------------------------------+--------------------------------------------

Comment(by wurmli):

 Let me add a similar case, exemplifying better why I think that not the
 vector library, but the way the optimiser deals with reductions seems to
 be the culprit. (That's why I had put "unnecessary heap allocations in the
 title; #1498 might be a related ticket.)

 {{{

 import System.Environment
 import Control.Applicative
 import Control.Monad

 import Control.Monad.ST

 import qualified Data.Vector.Storable.Mutable as VSM
 import qualified Data.Vector.Storable as G

 -------------------------------------------------------------------
 -- Increment counter by 1

 acc1 v = do{
   k<-VSM.read v 0;
   VSM.write v 0 (k+1)
   }

 -- Increment counter 4 times by 1 until limit is reached
 -- i.e. each loop adds 4

 whileModify1 :: G.MVector s Int -> ST s ()
 whileModify1 v = do
   go <- do{ k<-VSM.read v 0; n<-VSM.read v 1; return (compare k n)}
   case go of
     LT -> do {acc1 v; acc1 v; acc1 v; acc1 v; whileModify1 v}
     EQ -> return ()
     GT -> return ()

 -------------------------------------------------------------------
 -- Increment counter by 2

 acc2 v = do{
   k<-VSM.read v 0;
   VSM.write v 0 (k+1);
   k<-VSM.read v 0;
   VSM.write v 0 (k+1);
   }

 -- Increment counter twice by 2 until limit is reached
 -- i.e. each loop adds 4

 whileModify2 :: G.MVector s Int -> ST s ()
 whileModify2 v = do
   go <- do{ k<-VSM.read v 0; n<-VSM.read v 1; return (compare k n)}
   case go of
     LT -> do {acc2 v; acc2 v; whileModify2 v}
     EQ -> return ()
     GT -> return ()

 --------------------------------------------------------------------

 -- Case 1 is fast with low use of heap space
 -- Case 2 is slow with high use of heap
           (about 400 times slower, 300 times more heap)

 testAcc :: Int -> Int -> G.Vector Int
 testAcc a n = runST $ do
   v <- (G.thaw $ G.fromList [0,n]) :: ST s (G.MVector s Int)
   case a of
     1 -> whileModify1 v
     2 -> whileModify2 v
   G.freeze v

 main = do
   let k = 50000000
   n <- read.head <$> getArgs
   putStrLn $ show $ testAcc n k

 }}}

 The llvm backend improves timing about 4 fold

 {{{
 ghc -threaded -rtsopts -O2 -fllvm heapAndSpeed.hs
 }}}

 Even though the two ways of increasing the counter are almost the same,
 the resulting code behaves vastly different, not only in speed, but also
 in heap usage (and garbage collector activity).

 {{{
 ./heapAndSpeed 1 +RTS -s

 fromList [50000000,50000000]
           74,088 bytes allocated in the heap
            6,296 bytes copied during GC
           47,184 bytes maximum residency (1 sample(s))
           18,352 bytes maximum slop
                1 MB total memory in use (0 MB lost due to fragmentation)

 ...

 INIT    time    0.00s  (  0.00s elapsed)
 MUT     time    0.02s  (  0.02s elapsed)
 GC      time    0.00s  (  0.00s elapsed)
 EXIT    time    0.00s  (  0.00s elapsed)
 Total   time    0.02s  (  0.02s elapsed)

 }}}

 {{{

 ./heapAndSpeed 2 +RTS -s

 fromList [50000000,50000000]
   25,000,074,088 bytes allocated in the heap
        7,048,952 bytes copied during GC
           47,184 bytes maximum residency (2 sample(s))
           22,448 bytes maximum slop
                1 MB total memory in use (0 MB lost due to fragmentation)

 ...

 INIT    time    0.00s  (  0.00s elapsed)
 MUT     time    8.06s  (  8.07s elapsed)
 GC      time    0.17s  (  0.17s elapsed)
 EXIT    time    0.00s  (  0.00s elapsed)
 Total   time    8.23s  (  8.24s elapsed)

 }}}

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/7561#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler