[Haskell-cafe] How to write fast for loops

Tue Apr 29 03:39:35 UTC 2014

After some evaluation, here come some results of my benchmarks, and the
final function I've packaged:

* It is tricky to write an Enum-based general loop function that is fast
for all data types. The way overflow checks are implemented in different
types makes e.g. Int and Word32 behave differently; Word32 is slower.

* [a..b] is nice and can be just as fast as a manual loop - if you are
lucky. Unfortunately, if you don't want to rely on luck for your
program's performance, you can't have your [a..b] (at the time being).

* Vector's equivalent of [a..b] *might* not be as luck dependent, but
suffers from a factor 5 penalty, which is hopefully a but (see John's post).

Current most appealing solution for fast loops
----------------------------------------------

Unfortunately, this seems to be the most robustly fast (across all types
I have tested it with) loop:

  forLoop :: (Monad m) => a -> (a -> Bool) -> (a -> a) -> (a -> m ()) ->
m ()
  forLoop start cond inc f = go start
    where
      go !x | cond x    = f x >> go (inc x)
            | otherwise = return ()

And you can use it as

  forLoop 0 (< n) (+1) $ \i -> ...

Now, you probably don't like this, and neither do I, but it really looks
like this is what you should use if you want to write high performance
code :/

What also works is a `numLoop :: (Num a, Eq a, Monad m) => a -> a -> (a
-> m ()) -> m ()` with start and end value like [a..b], but the
`forLoop` aboves gives more flexibility and you can use either (+1) or
`succ` with it depending on whether you want performance or overflow-safety.

Does it really matter?
----------------------

One might think that in many cases the loop incrementing performance
doesn't matter too much, since the loop body should dominate the time spent.

That is true, but it is easy to get into cases where it doesn't. Some
examples:

* You want to write a test to make sure that reinterpret-casting a
Word32 to Float and back gives you the same Word32 [1]. Using `forLoop`
can make the difference if you have to wait 40 seconds for your test to
pass or 3 seconds. (This is how I got into this topic.)

* You want to implement something that does trivial operations on
unboxed vectors, say dot product or matrix multiplication. `forLoop` can
make a 5x performance difference.

Shouldn't I write something this trivial ad-hoc?
------------------------------------------------

I've packed `forLoop` into https://github.com/nh2/loop and uploaded it
to Hackage (http://hackage.haskell.org/package/loop).

* Maintaining the fastest way to loop in one place frees one from
thinking about it, and I plan to keep this updated with the fastest
implementation known (contributions welcome).

* It gives us a place to discuss alternatives and collect benchmarks for
high-performance looping.

Let's hope it becomes useless soon!

[1]: http://stackoverflow.com/a/7002812/263061