[Haskell-cafe] Switch optimization

Sun Jun 10 19:43:17 EDT 2007

Hi All,

I'm writing some variable byte codec routines, which are used in
inner^N loops, and which I want to make really efficient. There are
several spots where the code uses lookup tables.

The best example is the table, which given the first byte, returns the
number of additional bytes. It is a precomputed version of the
following function:

> codeLength :: Word8 -> Int
> codeLength w
>      | w .&. 0x80 == 0   = 0
>      | otherwise         = 1 + (codeLength $ w `shiftL` 1)

from which we construct a 256 entry table:

codeLen 0 = 0
codeLen 1 = 0
codeLen 2 = 0
...
codeLen 127 = 0
codeLen 128 = 1
...
codeLen 255 = 8

Now, compiling with ghc 6.6.1 and -O3, I see that it generates a long
chain of conditional branches. Now, even taking laziness into account,
this seems like terribly inefficient code.  I wrote this thinking it
would be more efficient than constructing a CAF array:

codeLens = listArray (0,255) (map codeLength [0..255])

but I'm guessing the CAF version would probably work out much more
efficient in the long run.

However, I think ghc (and any other compiler), should detect the
following properties:

1. all the clauses are mutually exclusive, so the sequencing is
irrelevant to the semantics

2. Given an explicit type declaration Word8 -> ...., the 256 cases
cover all the possible constructors of the type, so there are no
missing clauses.

I would have expected the generated code to have the form:

codeLen:
    x <- pick up arg
   return closure (codeLen' x)

codeLen':
    x' <- force x
    update precomputed-table[x']

Even if you leave out property 2 and include bounds checks, this seems
like an important kind of function to optimize well. So what have I
missed, or is it time to learn how to hack on ghc?

T.
-- 
Thomas Conway
drtomc at gmail.com

Silence is the perfectest herald of joy:
I were but little happy, if I could say how much.