[GHC] #12893: Profiling defeats stream fusion when using vector library

GHC ghc-devs at haskell.org
Mon Nov 28 23:49:13 UTC 2016


#12893: Profiling defeats stream fusion when using vector library
-------------------------------------+-------------------------------------
        Reporter:  newhoggy          |                Owner:
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:  stream-fusion
                                     |  profiling
Operating System:  MacOS X           |         Architecture:  x86_64
 Type of failure:  Runtime           |  (amd64)
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by newhoggy):

 Here is the an example of how I reproduce the problem assuming the vector
 library is already installed:

 {{{
 (java-1.8.0 ghc-7.10.3) ✔ ~/wrk/haskell-works/hw-tutorial-performance/hw-
 tutorial-performance-rwhe [master|✚ 1…1]
 10:46 $ cat Main.hs
 module Main (main) where

 import System.Environment
 import Text.Printf
 import qualified Data.Vector.Unboxed as DVU

 main :: IO ()
 main = do
     [d] <- map read `fmap` getArgs
     printf "%f\n" (mean (DVU.enumFromTo 1 d))

 mean :: DVU.Vector Double -> Double
 mean xs = s / fromIntegral (n :: Int)
   where
     Pair n s       = DVU.foldl k (Pair 0 0) xs
     k (Pair m t) x = Pair (m + 1) (t + x)
 {-# INLINE mean #-}

 data Pair = Pair !Int !Double
 (java-1.8.0 ghc-7.10.3) ✔ ~/wrk/haskell-works/hw-tutorial-performance/hw-
 tutorial-performance-rwhe [master|✚ 1…1]
 10:47 $ ghc -prof -rtsopts -fno-prof-count-entries  -fno-prof-cafs -fno-
 prof-auto -fforce-recomp -O2 -funbox-strict-fields Main.hs
 [1 of 1] Compiling Main             ( Main.hs, Main.o )
 Linking Main ...
 (java-1.8.0 ghc-7.10.3) ✔ ~/wrk/haskell-works/hw-tutorial-performance/hw-
 tutorial-performance-rwhe [master|✚ 1…1]
 10:47 $ time ./Main +RTS -sstderr -RTS 1e7
 5000000.5
    3,012,099,000 bytes allocated in the heap
    4,079,698,408 bytes copied during GC
    1,162,199,352 bytes maximum residency (13 sample(s))
       19,313,368 bytes maximum slop
             1990 MB total memory in use (0 MB lost due to fragmentation)

                                      Tot time (elapsed)  Avg pause  Max
 pause
   Gen  0      4869 colls,     0 par    1.112s   1.123s     0.0002s
 0.0017s
   Gen  1        13 colls,     0 par    1.333s   2.194s     0.1688s
 0.7226s

   INIT    time    0.000s  (  0.003s elapsed)
   MUT     time    1.027s  (  0.880s elapsed)
   GC      time    2.445s  (  3.317s elapsed)
   RP      time    0.000s  (  0.000s elapsed)
   PROF    time    0.000s  (  0.000s elapsed)
   EXIT    time    0.009s  (  0.195s elapsed)
   Total   time    3.483s  (  4.395s elapsed)

   %GC     time      70.2%  (75.5% elapsed)

   Alloc rate    2,932,567,762 bytes per MUT second

   Productivity  29.8% of total user, 23.6% of total elapsed


 real    0m4.565s
 user    0m3.484s
 sys     0m0.939s
 (java-1.8.0 ghc-7.10.3) ✔ ~/wrk/haskell-works/hw-tutorial-performance/hw-
 tutorial-performance-rwhe [master|✚ 1…1]
 10:47 $ ghc -rtsopts -fno-prof-count-entries  -fno-prof-cafs -fno-prof-
 auto -fforce-recomp -O2 -funbox-strict-fields Main.hs
 [1 of 1] Compiling Main             ( Main.hs, Main.o )
 Linking Main ...
 (java-1.8.0 ghc-7.10.3) ✔ ~/wrk/haskell-works/hw-tutorial-performance/hw-
 tutorial-performance-rwhe [master|✚ 1…1]
 10:47 $ time ./Main +RTS -sstderr -RTS 1e7
 5000000.5
          115,888 bytes allocated in the heap
            3,480 bytes copied during GC
           44,384 bytes maximum residency (1 sample(s))
           17,056 bytes maximum slop
                1 MB total memory in use (0 MB lost due to fragmentation)

                                      Tot time (elapsed)  Avg pause  Max
 pause
   Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s
 0.0000s
   Gen  1         1 colls,     0 par    0.000s   0.005s     0.0050s
 0.0050s

   INIT    time    0.000s  (  0.002s elapsed)
   MUT     time    0.010s  (  0.026s elapsed)
   GC      time    0.000s  (  0.005s elapsed)
   EXIT    time    0.000s  (  0.005s elapsed)
   Total   time    0.012s  (  0.039s elapsed)

   %GC     time       1.4%  (13.0% elapsed)

   Alloc rate    11,679,903 bytes per MUT second

   Productivity  97.5% of total user, 30.5% of total elapsed


 real    0m0.057s
 user    0m0.012s
 sys     0m0.008s
 }}}

 The first run with profiling is slow and uses a lot of memory.  The second
 run without profiling is very fast and uses a negligible amount of memory.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12893#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list