Loop unrolling + fusion ?

Sat Feb 28 14:42:57 EST 2009

>    import Data.Array.Vector
>    import Data.Bits
>    main = print . productU . mapU (*2) . mapU (`shiftL` 2) $ replicateU (100000000 :: Int) 
> (5::Int)
>
> and turns it into a loop like this:
>
>    $wfold :: Int# -> Int# -> Int#
>    $wfold =
>      \ (ww_sWX :: Int#) (ww1_sX1 :: Int#) ->
>        case ww1_sX1 of wild_B1 {
>          __DEFAULT ->
>            $wfold (*# ww_sWX 40) (+# wild_B1 1);
>          100000000 -> ww_sWX
>        }
..
> So now, since we've gone to such effort to produce a tiny loop like, this,
> can't we unroll it just a little?
> Anyone think of a way to apply Claus' TH unroller, or somehow convince GCC
> it is worth unrolling this guy, so we get the win of both aggressive high level
> fusion, and aggressive low level loop optimisations?

I'm not sure this is what you're after (been too long since I read assembler;-),
but it sounds as if you wanted to unroll the source of that fold, which seems
to be a local definition in foldS? Since unrolling is not always a good idea, it
would also be nice to have a way to control/initiate it from outside of the
uvector package (perhaps a RULE to redirect the call from foldS to a
foldSN, but foldS is hidden, and gets inlined away; but something
like that). If that works, you'd then run into the issue of wanting to
rearrange the *# and *# by variable and constant.

Claus