NOINLINE effects worker/wrapper - why and how to fix?

Ömer Sinan Ağacan omeragacan at gmail.com
Sat Jan 9 15:23:32 UTC 2016


So I was doing some micro benchmarks and I realized that adding NOINLINE to a
function somehow prevents worker/wrapper. Imagine this factorial function which
has a very obvious worker/wrapper opportunity:

    fac :: Int -> Int
    fac 0 = 1
    fac n = n * fac (n - 1)

If I add NOINLINE to this, no matter what -O I use, I get this STG:

    fac =
        \r srt:SRT:[] [ds_s38j]
            case ds_s38j of _ {
              I# ds1_s38l ->
                  case ds1_s38l of ds2_s38m {
                    __DEFAULT ->
                        case -# [ds2_s38m 1#] of sat_s38n {
                          __DEFAULT ->
                              let { sat_s38o = NO_CCS I#! [sat_s38n];
                              } in
                                case fac sat_s38o of _ {
                                  I# y_s38q ->
                                      case *# [ds2_s38m y_s38q] of sat_s38r {
                                        __DEFAULT -> I# [sat_s38r];
                                      };
                                };
                        };
                    0# -> lvl_r38f;
                  };
            };

Which doesn't have worker/wrapper. When I remove NOINLINE I get this
worker/wrappered version as expected:

    $wfac =
        \r srt:SRT:[] [ww_s38W]
            case ww_s38W of ds_s38X {
              __DEFAULT ->
                  case -# [ds_s38X 1#] of sat_s38Y {
                    __DEFAULT ->
                        case $wfac sat_s38Y of ww1_s38Z {
                          __DEFAULT -> *# [ds_s38X ww1_s38Z];
                        };
                  };
              0# -> 1#;
            };

    fac =
        \r srt:SRT:[] [w_s390]
            case w_s390 of _ {
              I# ww1_s392 ->
                  case $wfac ww1_s392 of ww2_s393 { __DEFAULT -> I#
[ww2_s393]; };
            };

I'd expect to get the same with NOINLINE too. First of all, I think this
suggests that if my function is big enough (or has some other property and GHCs
heuristics decide not to inline) I don't get worker/wrapper. Second, this type
of NOINLINEs are very useful for a couple of reasons.

For example, let's say I'm benchmarking a function, I NOINLINE the function
because I don't want the function to be transformed to something else during
the benchmarking, because of inlining and interactions of the function code
with the code at the call site (the code that benchmarks).

Another example is when looking at the code to see if expected optimizations
are done by GHC. I NOINLINE because otherwise I may have to look at dozens of
call sites, rather than just one place (the definition). But now I can't
reliably do that.

So my questions are: Why worker/wrapper is not applied to NOINLINE functions,
and how do I fix this?

Thanks.


More information about the ghc-devs mailing list