Loop unrolling + fusion ?

Max Bolingbroke batterseapower at hotmail.com
Sat Feb 28 15:14:01 EST 2009


2009/2/28 Don Stewart <dons at galois.com>:
> So now, since we've gone to such effort to produce a tiny loop like, this,
> can't we unroll it just a little? Sadly, my attempts to get GCC to trigger
> its loop unroller on this guy haven't succeeded. -funroll-loops and
> -funroll-all-loops doesn't  touch it,
>
> Anyone think of a way to apply Claus' TH unroller, or somehow convince GCC
> it is worth unrolling this guy, so we get the win of both aggressive high level
> fusion, and aggressive low level loop optimisations?

For a couple of weeks, I have had a working solution for the concatMap
problem using a sort of loop unrolling. I have tweaked the approach
slightly to also unroll the worker loop to get the results you desire.

You can check out the (very rough) code with:
git clone http://www.cl.cam.ac.uk/~mb566/git/concatmap/.git/
$EDITOR concatmap/CallUnrollConcatMap.hs

Apologies if the code is somewhat cryptic, but you should be able to
get the general idea.

A sneak preview is in order. The following Core:

"""
Rec {
$wf1_s1bU [ALWAYS LoopBreaker Nothing] :: GHC.Prim.Int#
                                          -> GHC.Prim.Int#
[Arity 1
 Str: DmdType L]
$wf1_s1bU =
  \ (ww_s1bO :: GHC.Prim.Int#) ->
    case GHC.Prim.<=# ww_s1bO 100000000
    of wild_B1 [ALWAYS Dead Just A] {
      GHC.Bool.False -> 0;
      GHC.Bool.True ->
        let {
          x_XMS [ALWAYS Just L] :: GHC.Prim.Int#
          [Str: DmdType]
          x_XMS = GHC.Prim.+# ww_s1bO 1 } in
        case GHC.Prim.<=# x_XMS 100000000 of wild_Xx [ALWAYS Dead Just A] {
          GHC.Bool.False ->
            GHC.Prim.*# (GHC.Prim.uncheckedIShiftL# ww_s1bO 2) 2;
          GHC.Bool.True ->
            let {
              x_XMX [ALWAYS Just L] :: GHC.Prim.Int#
              [Str: DmdType]
              x_XMX = GHC.Prim.+# x_XMS 1 } in
            case GHC.Prim.<=# x_XMX 100000000 of wild_XE [ALWAYS Dead Just A] {
              GHC.Bool.False ->
                GHC.Prim.+#
                  (GHC.Prim.*# (GHC.Prim.uncheckedIShiftL# ww_s1bO 2) 2)
                  (GHC.Prim.*# (GHC.Prim.uncheckedIShiftL# x_XMS 2) 2);
              GHC.Bool.True ->
                let {
                  x_XOf [ALWAYS Just L] :: GHC.Prim.Int#
                  [Str: DmdType]
                  x_XOf = GHC.Prim.+# x_XMX 1 } in
                case GHC.Prim.<=# x_XOf 100000000 of wild_XM [ALWAYS
Dead Just A] {
                  GHC.Bool.False ->
                    GHC.Prim.+#
                      (GHC.Prim.*# (GHC.Prim.uncheckedIShiftL# ww_s1bO 2) 2)
                      (GHC.Prim.+#
                         (GHC.Prim.*# (GHC.Prim.uncheckedIShiftL# x_XMS 2) 2)
                         (GHC.Prim.*# (GHC.Prim.uncheckedIShiftL# x_XMX 2) 2));
                  GHC.Bool.True ->
                    case $wf1_s1bU (GHC.Prim.+# x_XOf 1) of ww_s1bS {
__DEFAULT ->
                    GHC.Prim.+#
                      (GHC.Prim.*# (GHC.Prim.uncheckedIShiftL# ww_s1bO 2) 2)
                      (GHC.Prim.+#
                         (GHC.Prim.*# (GHC.Prim.uncheckedIShiftL# x_XMS 2) 2)
                         (GHC.Prim.+#
                            (GHC.Prim.*# (GHC.Prim.uncheckedIShiftL# x_XMX 2) 2)
                            (GHC.Prim.+#
                               (GHC.Prim.*#
(GHC.Prim.uncheckedIShiftL# x_XOf 2) 2) ww_s1bS)))
                    }
                }
            }
        }
    }
end Rec }
"""

Is generated by this program:

"""
result = sumS . mapS (*2) . mapS (`shiftL` 2) $ enumFromToS 0 100000000
"""

Of course, my approach is far from perfect:

* Unrolling ALWAYS happens, and to a fixed depth
* RULEs aren't very good at exploiting properties of arithmetic, as
Claus has pointed out
* concatMap fuses with my library but has lingering issues with
allocation if join points don't get inlined and has some strictness
problems too (to see this in action, try compliing the program "sumS $
mapS (+10) $ concatMapS (\x -> enumFromToS x 20) $ enumFromToS 1 10"
from the same file). It also is only permitted up to a fixed depth as
defined by the level of unrolling specified in the "spec" combinator.

But it does get your unrolling with TODAYs GHC, transparently to the
user of the uvector library.

I am currently looking at other, smarter, ways that GHC can optimize
loops as part of my research - so with luck this sort of manual
unrolling hackery will become less relevant in the future.

All the best,
Max


More information about the Glasgow-haskell-users mailing list