[Haskell-cafe] Re: Difference in Runtime but no explanation

Johann Höchtl johann.hoechtl at gmail.com
Wed Dec 16 04:04:02 EST 2009



On Dec 15, 10:45 pm, Daniel Fischer <daniel.is.fisc... at web.de> wrote:
> Am Dienstag 15 Dezember 2009 21:43:46 schrieb Johann Höchtl:
>
> > Please describe for me as a beginner, why there _is_ a difference:
>
> > 1. does len (x:xs) l = l `seq` len xs (l+1) vs. len xs $! (l+1) expand
> > into sthg. different?
>
> Yes. How different depends on optimisation level and compiler version.
>
> Without optimisations, seq is inlined, giving one self-contained loop, while ($!) is not
> inlined, so in every iteration of the loop there's a call to ($!) - bad for performance.
> ghc-6.10.3 and ghc-6.12.1 produce nearly identical core for each of the functions.
>
> With optimisations (-O2), both functions get compiled into a self-contained loop, 6.12.1
> produces near identical core for the two functions. The core for the second contains one
> case-expression the core for the first doesn't, but they should produce the same assembly.
> One improvement versus the unoptimised core is that plusInteger is called instead of
> GHC.Num.+.
>
> 6.10.3 produces very different core with -O2. The core for the second variant is close to
> that which 6.12.1 produces, I know not enough about core to see how that would influence
> performance. For the first variant, 6.10.3 produces different core, special casing for
> small and large Integers, which proves to be more efficient. Again, I'm not specialist
> enough to know why a) it produces so different core b) that core is so much faster.
>
> > 2. Do I understand right, that the first expression "should" actually
> > be slower but (for what reason ever in an unoptimized case isn't?
>
> No. In principle, with len (x:xs) l = l `seq` len xs (l+1), the evaluation of l lags one
> step behind, so you'd have the reduction
> len [1,2,3] 0
> ~> len [2,3] (0+1)
> ~> len [3] (1+1)
> ~> len [] (2+1)
> ~> (2+1)
> ~> 3
> while len (x:xs) l = let l' = l+1 in l' `seq` len xs l' gives
> len [1,2,3] 0
> ~> len [2,3] 1
> ~> len [3] 2
> ~> len [] 3
> ~> 3
> but that difference isn't measurable (if they produce different machine instructions at
> all, the difference is at most a handful of clock cycles).
> *If* len xs $! (l+1) were expanded into the latter, both would be - for all practical
> purposes at least - equally fast.
>
> > 3. The function is anotated with Integer. Why is suddenly Int of
> > importance?
>
> Thomas DuBuisson tried Int to investigate. It's always interesting what changes when you
> change types.
>
> > (4. When optimizing is switched on, the second expession executes
> > faster; as such I assume, that there is a difference between these two
> > statements)
>
> Not here. With 6.12.1 and -O(2), both are equally fast, with 6.10.3, the first is faster.
> I would rather expect 6.10.4 to behave more like 6.10.3. It may be, of course, that it's a
> hardware/OS issue which code is faster.
> Can you
>
> ghc-6.10.4 -O2 -fforce-recomp -ddump-simpl --make Whatever.hs > Whatever.core
>
> so I can see what core that produces?
>
>

Thank you very much for the helpful explanations. With optmisations
turned on, the runtime performance (whatever.exe +RTS -s) is almost
the same, with the seq variant still a tad faster.

I copy the .core in here:

======== BEGIN seq - variant =============

==================== Tidy Core ====================
Rec {
Main.len :: [GHC.Integer.Internals.Integer]
            -> GHC.Integer.Internals.Integer
            -> GHC.Integer.Internals.Integer
[GlobalId]
[Arity 2
 NoCafRefs
 Str: DmdType SS]
Main.len =
  \ (ds_dzE :: [GHC.Integer.Internals.Integer])
    (l_afx :: GHC.Integer.Internals.Integer) ->
    case ds_dzE of wild_B1 {
      [] -> l_afx;
      : x_afz xs_afB ->
        Main.len
          xs_afB
          (case l_afx of wild1_dAc {
             GHC.Integer.Internals.S# i_dAe ->
               case GHC.Prim.addIntC# i_dAe 1 of wild2_dAk { (# r_dAm,
c_dAn #) ->
               case c_dAn of wild3_dAp {
                 __DEFAULT ->
                   case GHC.Prim.int2Integer# i_dAe
                   of wild4_dAq { (# s_dAs, d_dAt #) ->
                   case GHC.Prim.int2Integer# 1 of wild5_dAv { (#
s1_dAx, d1_dAy #) ->
                   GHC.Integer.$splusInteger d1_dAy s1_dAx d_dAt s_dAs
                   }
                   };
                 0 -> GHC.Integer.Internals.S# r_dAm
               }
               };
             GHC.Integer.Internals.J# ds1_dAK ds11_dAL ->
               case GHC.Prim.int2Integer# 1 of wild2_dAR { (# s_dAT,
d_dAU #) ->
               GHC.Integer.$splusInteger d_dAU s_dAT ds11_dAL ds1_dAK
               }
           })
    }
end Rec }

Main.lvl :: GHC.Integer.Internals.Integer
[GlobalId]
[NoCafRefs]
Main.lvl = GHC.Integer.Internals.S# 1

Main.lvl1 :: GHC.Integer.Internals.Integer
[GlobalId]
[NoCafRefs]
Main.lvl1 = GHC.Integer.Internals.S# 10000000

Main.lvl2 :: [GHC.Integer.Internals.Integer]
[GlobalId]
[]
Main.lvl2 = GHC.Num.up_list Main.lvl Main.lvl Main.lvl1

Main.lvl3 :: GHC.Integer.Internals.Integer
[GlobalId]
[NoCafRefs]
Main.lvl3 = GHC.Integer.Internals.S# 0

Main.lvl4 :: GHC.Integer.Internals.Integer
[GlobalId]
[]
Main.lvl4 = Main.len Main.lvl2 Main.lvl3

Main.lvl5 :: GHC.Base.String
[GlobalId]
[]
Main.lvl5 =
  GHC.Num.$wshowsPrec 0 Main.lvl4 (GHC.Types.[] @ GHC.Types.Char)

Main.a :: GHC.Prim.State# GHC.Prim.RealWorld
          -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
[GlobalId]
[Arity 1
 Str: DmdType L]
Main.a =
  \ (eta_aBl :: GHC.Prim.State# GHC.Prim.RealWorld) ->
    case GHC.IO.a29 GHC.Handle.stdout Main.lvl5 eta_aBl
    of wild_aHm { (# new_s_aHo, a89_aHp #) ->
    GHC.IO.$wa10 GHC.Handle.stdout '\n' new_s_aHo
    }

Main.a1 :: GHC.Prim.State# GHC.Prim.RealWorld
           -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
[GlobalId]
[Arity 1
 Str: DmdType L]
Main.a1 =
  GHC.TopHandler.a4
    @ ()
    (Main.a
     `cast` (sym ((GHC.IOBase.:CoIO) ())
             :: GHC.Prim.State# GHC.Prim.RealWorld
                -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
                  ~
                GHC.IOBase.IO ()))

Main.main :: GHC.IOBase.IO ()
[GlobalId]
[Arity 1
 Str: DmdType L]
Main.main =
  Main.a
  `cast` (sym ((GHC.IOBase.:CoIO) ())
          :: GHC.Prim.State# GHC.Prim.RealWorld
             -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
               ~
             GHC.IOBase.IO ())

:Main.main :: GHC.IOBase.IO ()
[GlobalId]
[Arity 1
 Str: DmdType L]
:Main.main =
  Main.a1
  `cast` (sym ((GHC.IOBase.:CoIO) ())
          :: GHC.Prim.State# GHC.Prim.RealWorld
             -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
               ~
             GHC.IOBase.IO ())




==================== Tidy Core Rules ====================


======== BEGIN $!- variant =============

==================== Tidy Core ====================
Main.lit :: GHC.Integer.Internals.Integer
[GlobalId]
[NoCafRefs
 Str: DmdType]
Main.lit = GHC.Integer.Internals.S# 1

Rec {
Main.len :: [GHC.Integer.Internals.Integer]
            -> GHC.Integer.Internals.Integer
            -> GHC.Integer.Internals.Integer
[GlobalId]
[Arity 2
 NoCafRefs
 Str: DmdType SS]
Main.len =
  \ (ds_dzF :: [GHC.Integer.Internals.Integer])
    (l_afx :: GHC.Integer.Internals.Integer) ->
    case ds_dzF of wild_B1 {
      [] -> l_afx;
      : x_afz xs_afB ->
        case GHC.Integer.plusInteger l_afx Main.lit
        of x1_azT { __DEFAULT ->
        Main.len xs_afB x1_azT
        }
    }
end Rec }

Main.lvl :: GHC.Integer.Internals.Integer
[GlobalId]
[NoCafRefs]
Main.lvl = GHC.Integer.Internals.S# 10000000

Main.lvl1 :: [GHC.Integer.Internals.Integer]
[GlobalId]
[]
Main.lvl1 = GHC.Num.up_list Main.lit Main.lit Main.lvl

Main.lvl2 :: GHC.Integer.Internals.Integer
[GlobalId]
[NoCafRefs]
Main.lvl2 = GHC.Integer.Internals.S# 0

Main.lvl3 :: GHC.Integer.Internals.Integer
[GlobalId]
[]
Main.lvl3 = Main.len Main.lvl1 Main.lvl2

Main.lvl4 :: GHC.Base.String
[GlobalId]
[]
Main.lvl4 =
  GHC.Num.$wshowsPrec 0 Main.lvl3 (GHC.Types.[] @ GHC.Types.Char)

Main.a :: GHC.Prim.State# GHC.Prim.RealWorld
          -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
[GlobalId]
[Arity 1
 Str: DmdType L]
Main.a =
  \ (eta_aBi :: GHC.Prim.State# GHC.Prim.RealWorld) ->
    case GHC.IO.a29 GHC.Handle.stdout Main.lvl4 eta_aBi
    of wild_aHj { (# new_s_aHl, a89_aHm #) ->
    GHC.IO.$wa10 GHC.Handle.stdout '\n' new_s_aHl
    }

Main.a1 :: GHC.Prim.State# GHC.Prim.RealWorld
           -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
[GlobalId]
[Arity 1
 Str: DmdType L]
Main.a1 =
  GHC.TopHandler.a4
    @ ()
    (Main.a
     `cast` (sym ((GHC.IOBase.:CoIO) ())
             :: GHC.Prim.State# GHC.Prim.RealWorld
                -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
                  ~
                GHC.IOBase.IO ()))

Main.main :: GHC.IOBase.IO ()
[GlobalId]
[Arity 1
 Str: DmdType L]
Main.main =
  Main.a
  `cast` (sym ((GHC.IOBase.:CoIO) ())
          :: GHC.Prim.State# GHC.Prim.RealWorld
             -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
               ~
             GHC.IOBase.IO ())

:Main.main :: GHC.IOBase.IO ()
[GlobalId]
[Arity 1
 Str: DmdType L]
:Main.main =
  Main.a1
  `cast` (sym ((GHC.IOBase.:CoIO) ())
          :: GHC.Prim.State# GHC.Prim.RealWorld
             -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
               ~
             GHC.IOBase.IO ())




==================== Tidy Core Rules ====================


As you said, the $! variant of your 6.10.3 produces almost the same
core as that of 6.12.1 would indicate, that 6.12.1 misses an
optimisation, as the seq variant (at least on 6.10.4) is still faster,
albeit that is neglectable.

>
> > Thank you!
>
Thank you, Johann
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-C... at haskell.orghttp://www.haskell.org/mailman/listinfo/haskell-cafe


More information about the Haskell-Cafe mailing list