[Haskell-cafe] Switch optimization
Stefan O'Rear
stefanor at cox.net
Sun Jun 10 20:00:00 EDT 2007
On Mon, Jun 11, 2007 at 09:43:17AM +1000, Thomas Conway wrote:
> Hi All,
>
> I'm writing some variable byte codec routines, which are used in
> inner^N loops, and which I want to make really efficient. There are
> several spots where the code uses lookup tables.
>
> The best example is the table, which given the first byte, returns the
> number of additional bytes. It is a precomputed version of the
> following function:
>
> >codeLength :: Word8 -> Int
> >codeLength w
> > | w .&. 0x80 == 0 = 0
> > | otherwise = 1 + (codeLength $ w `shiftL` 1)
>
> from which we construct a 256 entry table:
>
> codeLen 0 = 0
> codeLen 1 = 0
> codeLen 2 = 0
> ...
> codeLen 127 = 0
> codeLen 128 = 1
> ...
> codeLen 255 = 8
>
> Now, compiling with ghc 6.6.1 and -O3, I see that it generates a long
> chain of conditional branches. Now, even taking laziness into account,
> this seems like terribly inefficient code. I wrote this thinking it
> would be more efficient than constructing a CAF array:
>
> codeLens = listArray (0,255) (map codeLength [0..255])
>
> but I'm guessing the CAF version would probably work out much more
> efficient in the long run.
>
> However, I think ghc (and any other compiler), should detect the
> following properties:
>
> 1. all the clauses are mutually exclusive, so the sequencing is
> irrelevant to the semantics
>
> 2. Given an explicit type declaration Word8 -> ...., the 256 cases
> cover all the possible constructors of the type, so there are no
> missing clauses.
>
> I would have expected the generated code to have the form:
>
> codeLen:
> x <- pick up arg
> return closure (codeLen' x)
>
> codeLen':
> x' <- force x
> update precomputed-table[x']
>
> Even if you leave out property 2 and include bounds checks, this seems
> like an important kind of function to optimize well. So what have I
> missed, or is it time to learn how to hack on ghc?
I'd suggest learning how to hack on GHC, I get a chain of branches even
at the maximum optimization setting. Hackers are always appreciated!
For an early project, how about
http://hackage.haskell.org/trac/ghc/ticket/1261 :)
(-O3 is misparsed as an extremely low setting, worse than -O and
sometimes worse than -Onot)
BTW, you should definitely look at the FFI for small performance
critical functions. It's already a miracle that Haskell is as fast as
it is; I would not hold high hopes for idiomatic general Haskell
reaching C speed any time soon. (For some special cases like operating
on multi-MB strings and large parallel arrays, there is definite
promise.) Of course the interface is fairly costly, be sure to
benchmark.
Stefan
More information about the Haskell-Cafe
mailing list