[GHC] #15176: Superclass `Monad m =>` makes program run 100 times slower

GHC ghc-devs at haskell.org
Wed Sep 5 06:43:52 UTC 2018


#15176: Superclass `Monad m =>` makes program run 100 times slower
-------------------------------------+-------------------------------------
        Reporter:  danilo2           |                Owner:  osa1
            Type:  bug               |               Status:  new
        Priority:  highest           |            Milestone:  8.8.1
       Component:  Compiler          |              Version:  8.4.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by osa1):

 I be able to get more useful RTS stats and smaller Core I made a smaller
 reproducer:

 {{{#!haskell
 {-# LANGUAGE TypeApplications #-}
 {-# LANGUAGE ScopedTypeVariables #-}
 {-# LANGUAGE BangPatterns #-}
 {-# LANGUAGE ConstraintKinds #-}
 {-# LANGUAGE FlexibleContexts  #-}
 {-# LANGUAGE GADTs #-}
 {-# LANGUAGE OverloadedStrings #-}
 {-# LANGUAGE Strict #-}

 module Main where

 import Prologue

 import qualified Control.Monad.Exception as Exception
 import qualified Data.Graph.Data.Graph.Class as Graph
 import qualified Data.Graph.Fold.Partition as Partition
 import qualified Luna.IR as IR
 import qualified Luna.Pass as Pass
 import qualified Luna.Pass.Scheduler as Scheduler

 import Luna.Pass (Pass)
 import Luna.Pass.Basic (Compilation)

 type OnDemandPass stage pass m =
     ( MonadIO m
     , Typeable pass
     , Pass.Compile stage pass m
     , Exception.MonadException Scheduler.Error m
     )

 runPass :: forall stage pass m . OnDemandPass stage pass m
         => Pass stage pass () -> m ()
 runPass !pass = Scheduler.evalT $ do
     Scheduler.registerPassFromFunction__ pass
     Scheduler.runPassSameThreadByType @pass
 {-# INLINE runPass #-}

 runPass' :: Pass Compilation Pass.BasicPass () -> IO ()
 runPass' p = Graph.encodeAndEval @Compilation (runPass p)
 {-# INLINE runPass' #-}

 partitionsUnify :: Int -> IO ()
 partitionsUnify i = runPass' $ do
     !a <- IR.var "a"
     !b <- IR.var "b"
     !u <- IR.unify a b
     let go !0 = let !o = pure () in o
         go !j = do
             !_ <- Partition.partition u
             go $! j - 1
     go i

 main :: IO ()
 main = partitionsUnify (10^6)
 }}}

 Put this in core/test/Main.hs and add this to luna-core.cabal:

 {{{
 executable bench-test
     main-is: Main.hs

     hs-source-dirs: test/

     build-depends:
         ansi-terminal -any,
         base -any,
         containers -any,
         convert -any,
         deepseq -any,
         ghc -any,
         layered-state -any,
         luna-autovector -any,
         luna-core -any,
         luna-cpp-containers -any,
         luna-data-storable -any,
         luna-data-typemap -any,
         luna-exception -any,
         luna-foreign-utils -any,
         luna-generic-traversable -any,
         luna-generic-traversable2 -any,
         luna-memory-manager -any,
         luna-memory-pool -any,
         luna-tuple-utils -any,
         mtl -any,
         primitive -any,
         prologue -any,
         structs -any,
         unboxed-ref >=0.4.0.0,
         vector -any

     ghc-options: -O2 -ticky -rtsopts -Wall
 }}}

 Results: (with and without `Monad =>`)

 {{{
 ============= With Monad =>
 =======================================================

 luna git:(master) $ time (cabal-run bench-test +RTS -s)
   27,544,258,632 bytes allocated in the heap
       19,561,928 bytes copied during GC
          205,496 bytes maximum residency (2 sample(s))
           33,152 bytes maximum slop
                2 MB total memory in use (0 MB lost due to fragmentation)
                                      Tot time (elapsed)  Avg pause  Max
 pause
   Gen  0     26366 colls,     0 par    0.270s   0.268s     0.0000s
 0.0008s
   Gen  1         2 colls,     0 par    0.002s   0.002s     0.0008s
 0.0011s

   INIT    time    0.000s  (  0.000s elapsed)
   MUT     time   13.449s  ( 13.487s elapsed)
   GC      time    0.272s  (  0.269s elapsed)
   EXIT    time    0.000s  (  0.000s elapsed)
   Total   time   13.721s  ( 13.757s elapsed)

   %GC     time       2.0%  (2.0% elapsed)

   Alloc rate    2,048,118,786 bytes per MUT second

   Productivity  98.0% of total user, 98.0% of total elapsed

 ( cabal-run bench-test +RTS -s; )  13,72s user 0,04s system 99% cpu 13,761
 total

 ============= Original
 ============================================================

 luna git:(master) $ time (cabal-run bench-test +RTS -s)
    3,952,215,688 bytes allocated in the heap
        2,071,824 bytes copied during GC
          200,320 bytes maximum residency (2 sample(s))
           33,152 bytes maximum slop
                2 MB total memory in use (0 MB lost due to fragmentation)

                                      Tot time (elapsed)  Avg pause  Max
 pause
   Gen  0      3790 colls,     0 par    0.042s   0.043s     0.0000s
 0.0008s
   Gen  1         2 colls,     0 par    0.001s   0.002s     0.0009s
 0.0010s

   INIT    time    0.000s  (  0.000s elapsed)
   MUT     time    1.595s  (  1.605s elapsed)
   GC      time    0.043s  (  0.044s elapsed)
   EXIT    time    0.000s  (  0.000s elapsed)
   Total   time    1.638s  (  1.650s elapsed)

   %GC     time       2.6%  (2.7% elapsed)

   Alloc rate    2,478,513,730 bytes per MUT second

   Productivity  97.4% of total user, 97.3% of total elapsed

 ( cabal-run bench-test +RTS -s; )  1,64s user 0,01s system 99% cpu 1,654
 total
 }}}

 I'll now try with `-ticky`.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15176#comment:10>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list