[GHC] #6070: Fun with the demand analyser

GHC ghc-devs at haskell.org
Thu Jan 16 15:32:13 UTC 2014


#6070: Fun with the demand analyser
-------------------------------------+------------------------------------
        Reporter:  simonpj           |            Owner:  simonpj
            Type:  bug               |           Status:  new
        Priority:  normal            |        Milestone:  7.8.1
       Component:  Compiler          |          Version:  7.4.1
      Resolution:                    |         Keywords:
Operating System:  Unknown/Multiple  |     Architecture:  Unknown/Multiple
 Type of failure:  None/Unknown      |       Difficulty:  Unknown
       Test Case:                    |       Blocked By:
        Blocking:                    |  Related Tickets:
-------------------------------------+------------------------------------
Description changed by simonpj:

Old description:

> Max writes: I've been trying to speed up the supercompiler, and in the
> process
> I've noticed some infelicities in the demand analyser that might be
> worth looking at.
>
> == Infelicity 1: analysis does not take into account extra unboxing done
> by the CPR transformation ==
> An example is:
> {{{
> e :: (Int, Int) -> Int -> (Int, Int)
> e x y = x `seq` if y > 10
>          then x
>          else e x (y + 1)
> }}}
> Because x is seqed, the occurrence in the "then" branch gets the CPR
> property so e has the CPR property overall. However, the overall
> demand on x is S(AA), i.e. the demand analyser believes the x box is
> used -- if the box were unused it would get the demand U(LL). So the
> overall demand type is S(AA)U(L)m, and the worker looks like:
> {{{
> Rec {
> CPR2.$we [Occ=LoopBreaker]
>   :: (GHC.Types.Int, GHC.Types.Int)
>      -> GHC.Prim.Int# -> (# GHC.Types.Int, GHC.Types.Int #)
> [GblId,
>  Arity=2,
>  Caf=NoCafRefs,
>  Str=DmdType S(AA)L,
>  Unf=OtherCon []]
> CPR2.$we =
>   \ (w_srv :: (GHC.Types.Int, GHC.Types.Int))
>     (ww_srt :: GHC.Prim.Int#) ->
>     case GHC.Prim.># ww_srt 10 of _ {
>       GHC.Types.False ->
>         case GHC.Prim.+# ww_srt 1 of sat_ssS { __DEFAULT ->
>         CPR2.$we w_srv sat_ssS
>         };
>       GHC.Types.True ->
>         case w_srv of _ { (ww2_srA, ww3_srB) -> (# ww2_srA, ww3_srB #) }
>     }
> end Rec }
> }}}
> But if we demand-analysed $we then GHC would say that it had the
> demand type U(LL)L and unbox the pair argument! It seems silly that
> the demand analyser outputs code that can be improved further by just
> running demand analysis again.
>
> Somewhere where this really bit me in practice is in:
> {{{
> d :: M.Map Int Int -> (Int, Int)
> d m = M.foldrWithKey' (\k v (a, b) -> if k + v > 2 then (a, b) else
> (b, a)) (0, 1) m
> }}}
> Which generates this inner loop:
> {{{
> Rec {
> CPR2.$wgo2 [Occ=LoopBreaker]
>   :: (GHC.Types.Int, GHC.Types.Int)
>      -> Data.Map.Base.Map GHC.Types.Int GHC.Types.Int
>      -> (# GHC.Types.Int, GHC.Types.Int #)
> [GblId,
>  Arity=2,
>  Caf=NoCafRefs,
>  Str=DmdType S(AA)S,
>  Unf=OtherCon []]
> CPR2.$wgo2 =
>   \ (w_srS :: (GHC.Types.Int, GHC.Types.Int))
>     (w1_srQ :: Data.Map.Base.Map GHC.Types.Int GHC.Types.Int) ->
>     case w1_srQ of _ {
>       Data.Map.Base.Tip ->
>         case w_srS of _ { (ww1_srW, ww2_srX) -> (# ww1_srW, ww2_srX #) };
>       Data.Map.Base.Bin rb_st1 kx_ss3 x_ssa l_ssk r_ss6 ->
>         case kx_ss3 of _ { GHC.Types.I# x1_ssd ->
>         case CPR2.$wgo2 w_srS r_ss6 of _ { (# ww1_ssi, ww2_ssh #) ->
>         case x_ssa of _ { GHC.Types.I# y_sse ->
>         case GHC.Prim.+# x1_ssd y_sse of sat_st0 { __DEFAULT ->
>         case GHC.Prim.># sat_st0 2 of _ {
>           GHC.Types.False ->
>             let {
>               sat_ssZ :: (GHC.Types.Int, GHC.Types.Int)
>               [LclId]
>               sat_ssZ = (ww2_ssh, ww1_ssi) } in
>             CPR2.$wgo2 sat_ssZ l_ssk;
>           GHC.Types.True ->
>             let {
>               sat_st6 :: (GHC.Types.Int, GHC.Types.Int)
>               [LclId]
>               sat_st6 = (ww1_ssi, ww2_ssh) } in
>             CPR2.$wgo2 sat_st6 l_ssk
>         }
>         }
>         }
>         }
>         }
>     }
> end Rec }
> }}}
> We can save a number of allocations proportional to the size of the
> Map if the demand analyser didn't have this problem.
>
> I hacked up the analyser to "fix" this by changing the lines at line
> 473 onward to read:
> {{{
>     if isTopLevel top_lvl
>      then fn_ty -- Don't record top level things
>      else case dmd of
>             Box (Eval (Poly Abs)) | DmdType _ _ res <- fn_ty,
> returnsCPR res -> addVarDmd fn_ty var (Eval (Poly lazyDmd))
>             _
>       -> addVarDmd fn_ty var dmd
> }}}
> So if we demand a CPRish variable (such as bound by a case scrutinee
> or a U(LL)-demanded lambda-binder) in the evalDmd then I throw away
> the Box part of the evalDmd on the basis that after CPRing we won't
> demand the box at all.
>
> I doubt this hack is the right solution (and I haven't tried it to see
> how it affects the libraries) but perhaps it gives you some ideas.
>

> == Infelicity 2: a case where demand summarisation hurts us ==
>
> I found practical examples where summarising the demand transfomer of
> a function as a single strictness signature hurt us. The examples were
> code like:
> {{{
> h :: (Int, Int) -> Int -> (Int, Int)
> h x y = if y > 10
>          then x
>          else h (case h x 0 of (y1, y2) -> (y2, y1)) (y + 1)
> }}}
> If, at the innermost use of h, we re-analyse h in the demand
> C(C(U(LL))) rather than just unleashing the vanilla DmdSig computed
> from the demand C(C(S)) then we can unbox the pair argument. Currently
> GHC just gives h the DmdType SU(L) which doesn't eliminate the
> allocation of the pair at all.
>
> This situation occurs in practice with code like:
> {{{
> c :: M.Map Int Int -> (Int, Int)
> c m = M.foldrWithKey (\k v (a, b) -> if k + v > 2 then (a, b) else (b,
> a)) (0, 1) m
> }}}
> The difference from my earlier example d is that I'm using the version
> of `foldrWithKey` that doesn't call `seq`. Current GHC generates this
> inner loop:
> {{{
> Rec {
> CPR2.c_go2 [Occ=LoopBreaker]
>   :: (GHC.Types.Int, GHC.Types.Int)
>      -> Data.Map.Base.Map GHC.Types.Int GHC.Types.Int
>      -> (GHC.Types.Int, GHC.Types.Int)
> [GblId,
>  Arity=2,
>  Caf=NoCafRefs,
>  Str=DmdType U(L*)S,
>  Unf=OtherCon []]
> CPR2.c_go2 =
>   \ (z_spW :: (GHC.Types.Int, GHC.Types.Int))
>     (ds_spU :: Data.Map.Base.Map GHC.Types.Int GHC.Types.Int) ->
>     case ds_spU of _ {
>       Data.Map.Base.Tip -> z_spW;
>       Data.Map.Base.Bin rb_sqq kx_sq2 x_sq9 l_sqj r_sq5 ->
>         case kx_sq2 of _ { GHC.Types.I# x1_sqc ->
>         case CPR2.c_go2 z_spW r_sq5 of wild2_sqk { (a_sqh, b_sqg) ->
>         case x_sq9 of _ { GHC.Types.I# y_sqd ->
>         case GHC.Prim.+# x1_sqc y_sqd of sat_sqp { __DEFAULT ->
>         case GHC.Prim.># sat_sqp 2 of _ {
>           GHC.Types.False ->
>             let {
>               sat_sqo :: (GHC.Types.Int, GHC.Types.Int)
>               [LclId]
>               sat_sqo = (b_sqg, a_sqh) } in
>             CPR2.c_go2 sat_sqo l_sqj;
>           GHC.Types.True -> CPR2.c_go2 wild2_sqk l_sqj
>         }
>         }
>         }
>         }
>         }
>     }
> end Rec }
> }}}
> I implemented this but the overhead of reanalysing a function at each
> occurrence site is prohibitive (with my changes paraffins took 2.5s to
> compile instead of 0.3s). It does fix the problem though.
>
> A scheme like in "stricterness more relevant" but with let-binding
> that is polymorphic in the use-site demand might be able to detect the
> extra strictness here. I think Stefan Holdermanns has a paper
> describing such a system. But this is anyway much harder to fix than
> my first infelicity.

New description:

 Max writes: I've been trying to speed up the supercompiler, and in the
 process
 I've noticed some infelicities in the demand analyser that might be
 worth looking at.  One is #5949.  The other is:

 == Infelicity 2: a case where demand summarisation hurts us ==

 I found practical examples where summarising the demand transfomer of
 a function as a single strictness signature hurt us. The examples were
 code like:
 {{{
 h :: (Int, Int) -> Int -> (Int, Int)
 h x y = if y > 10
          then x
          else h (case h x 0 of (y1, y2) -> (y2, y1)) (y + 1)
 }}}
 If, at the innermost use of h, we re-analyse h in the demand
 C(C(U(LL))) rather than just unleashing the vanilla DmdSig computed
 from the demand C(C(S)) then we can unbox the pair argument. Currently
 GHC just gives h the DmdType SU(L) which doesn't eliminate the
 allocation of the pair at all.

 This situation occurs in practice with code like:
 {{{
 c :: M.Map Int Int -> (Int, Int)
 c m = M.foldrWithKey (\k v (a, b) -> if k + v > 2 then (a, b) else (b,
 a)) (0, 1) m
 }}}
 The difference from my earlier example d is that I'm using the version
 of `foldrWithKey` that doesn't call `seq`. Current GHC generates this
 inner loop:
 {{{
 Rec {
 CPR2.c_go2 [Occ=LoopBreaker]
   :: (GHC.Types.Int, GHC.Types.Int)
      -> Data.Map.Base.Map GHC.Types.Int GHC.Types.Int
      -> (GHC.Types.Int, GHC.Types.Int)
 [GblId,
  Arity=2,
  Caf=NoCafRefs,
  Str=DmdType U(L*)S,
  Unf=OtherCon []]
 CPR2.c_go2 =
   \ (z_spW :: (GHC.Types.Int, GHC.Types.Int))
     (ds_spU :: Data.Map.Base.Map GHC.Types.Int GHC.Types.Int) ->
     case ds_spU of _ {
       Data.Map.Base.Tip -> z_spW;
       Data.Map.Base.Bin rb_sqq kx_sq2 x_sq9 l_sqj r_sq5 ->
         case kx_sq2 of _ { GHC.Types.I# x1_sqc ->
         case CPR2.c_go2 z_spW r_sq5 of wild2_sqk { (a_sqh, b_sqg) ->
         case x_sq9 of _ { GHC.Types.I# y_sqd ->
         case GHC.Prim.+# x1_sqc y_sqd of sat_sqp { __DEFAULT ->
         case GHC.Prim.># sat_sqp 2 of _ {
           GHC.Types.False ->
             let {
               sat_sqo :: (GHC.Types.Int, GHC.Types.Int)
               [LclId]
               sat_sqo = (b_sqg, a_sqh) } in
             CPR2.c_go2 sat_sqo l_sqj;
           GHC.Types.True -> CPR2.c_go2 wild2_sqk l_sqj
         }
         }
         }
         }
         }
     }
 end Rec }
 }}}
 I implemented this but the overhead of reanalysing a function at each
 occurrence site is prohibitive (with my changes paraffins took 2.5s to
 compile instead of 0.3s). It does fix the problem though.

 A scheme like in "stricterness more relevant" but with let-binding
 that is polymorphic in the use-site demand might be able to detect the
 extra strictness here. I think Stefan Holdermanns has a paper
 describing such a system. But this is anyway much harder to fix than
 my first infelicity.

--

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/6070#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list