>> with support of loop unrolling,

> GHC calls this "inlining".

1. loop unrolling means generating several iterations of loop body,
so that, say, 100 iterations of *p++=*q++ becomes 25 iterations of
*p++=*q++; *p++=*q++; *p++=*q++; *p++=*q++;

2. actually, ghc can't inline tail-recursive functions at all
(although i don't checked this after 6.4)

there are also many more optimization tricks. i don't think that
modern compiler with optimization level comparable to gcc can be
delivered without many man-years of development

