I can't offer much insight but could the answer lie in the Integer type? I suspect that with a sufficiently large fixed Int type (2^14 bits?) the performance of the two functions would be almost equal. Could it be that the second function delays the multiplication of large numbers as long as possible? The divide-and-conquer approach?