[GHC] #15176: Superclass `Monad m =>` makes program run 100 times slower
GHC
ghc-devs at haskell.org
Wed Sep 5 06:43:52 UTC 2018
#15176: Superclass `Monad m =>` makes program run 100 times slower
-------------------------------------+-------------------------------------
Reporter: danilo2 | Owner: osa1
Type: bug | Status: new
Priority: highest | Milestone: 8.8.1
Component: Compiler | Version: 8.4.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by osa1):
I be able to get more useful RTS stats and smaller Core I made a smaller
reproducer:
{{{#!haskell
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE Strict #-}
module Main where
import Prologue
import qualified Control.Monad.Exception as Exception
import qualified Data.Graph.Data.Graph.Class as Graph
import qualified Data.Graph.Fold.Partition as Partition
import qualified Luna.IR as IR
import qualified Luna.Pass as Pass
import qualified Luna.Pass.Scheduler as Scheduler
import Luna.Pass (Pass)
import Luna.Pass.Basic (Compilation)
type OnDemandPass stage pass m =
( MonadIO m
, Typeable pass
, Pass.Compile stage pass m
, Exception.MonadException Scheduler.Error m
)
runPass :: forall stage pass m . OnDemandPass stage pass m
=> Pass stage pass () -> m ()
runPass !pass = Scheduler.evalT $ do
Scheduler.registerPassFromFunction__ pass
Scheduler.runPassSameThreadByType @pass
{-# INLINE runPass #-}
runPass' :: Pass Compilation Pass.BasicPass () -> IO ()
runPass' p = Graph.encodeAndEval @Compilation (runPass p)
{-# INLINE runPass' #-}
partitionsUnify :: Int -> IO ()
partitionsUnify i = runPass' $ do
!a <- IR.var "a"
!b <- IR.var "b"
!u <- IR.unify a b
let go !0 = let !o = pure () in o
go !j = do
!_ <- Partition.partition u
go $! j - 1
go i
main :: IO ()
main = partitionsUnify (10^6)
}}}
Put this in core/test/Main.hs and add this to luna-core.cabal:
{{{
executable bench-test
main-is: Main.hs
hs-source-dirs: test/
build-depends:
ansi-terminal -any,
base -any,
containers -any,
convert -any,
deepseq -any,
ghc -any,
layered-state -any,
luna-autovector -any,
luna-core -any,
luna-cpp-containers -any,
luna-data-storable -any,
luna-data-typemap -any,
luna-exception -any,
luna-foreign-utils -any,
luna-generic-traversable -any,
luna-generic-traversable2 -any,
luna-memory-manager -any,
luna-memory-pool -any,
luna-tuple-utils -any,
mtl -any,
primitive -any,
prologue -any,
structs -any,
unboxed-ref >=0.4.0.0,
vector -any
ghc-options: -O2 -ticky -rtsopts -Wall
}}}
Results: (with and without `Monad =>`)
{{{
============= With Monad =>
=======================================================
luna git:(master) $ time (cabal-run bench-test +RTS -s)
27,544,258,632 bytes allocated in the heap
19,561,928 bytes copied during GC
205,496 bytes maximum residency (2 sample(s))
33,152 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max
pause
Gen 0 26366 colls, 0 par 0.270s 0.268s 0.0000s
0.0008s
Gen 1 2 colls, 0 par 0.002s 0.002s 0.0008s
0.0011s
INIT time 0.000s ( 0.000s elapsed)
MUT time 13.449s ( 13.487s elapsed)
GC time 0.272s ( 0.269s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 13.721s ( 13.757s elapsed)
%GC time 2.0% (2.0% elapsed)
Alloc rate 2,048,118,786 bytes per MUT second
Productivity 98.0% of total user, 98.0% of total elapsed
( cabal-run bench-test +RTS -s; ) 13,72s user 0,04s system 99% cpu 13,761
total
============= Original
============================================================
luna git:(master) $ time (cabal-run bench-test +RTS -s)
3,952,215,688 bytes allocated in the heap
2,071,824 bytes copied during GC
200,320 bytes maximum residency (2 sample(s))
33,152 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max
pause
Gen 0 3790 colls, 0 par 0.042s 0.043s 0.0000s
0.0008s
Gen 1 2 colls, 0 par 0.001s 0.002s 0.0009s
0.0010s
INIT time 0.000s ( 0.000s elapsed)
MUT time 1.595s ( 1.605s elapsed)
GC time 0.043s ( 0.044s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 1.638s ( 1.650s elapsed)
%GC time 2.6% (2.7% elapsed)
Alloc rate 2,478,513,730 bytes per MUT second
Productivity 97.4% of total user, 97.3% of total elapsed
( cabal-run bench-test +RTS -s; ) 1,64s user 0,01s system 99% cpu 1,654
total
}}}
I'll now try with `-ticky`.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15176#comment:10>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list