[Haskell-cafe] Optimization by cafe

ChrisK haskell at list.mightyreason.com
Wed Mar 4 18:22:45 EST 2009

I have a performance glitch in regex-tdfa related to GHC-6.10.1 (on a PPC G4 
with -O2).

I have okay code that runs in 0.2 seconds.
I simplify this code to run faster and it takes 4.0 seconds (it allocates 
something in the loop, I think)
I turn on -prof for the simpler code and it runs in 0.25 seconds.

I can't profile my way out of it, due to above paradox.  Has anyone else seen 
this sort of paradox?

The simpler code's loop is just a bunch of pattern matching and lookups, multi:

> matchTest :: Regex -> S.ByteString -> Bool
> matchTest !r !input = ...
>   multi (Simple' {dt_win=w,dt_trans=CharMap t, dt_other=o}) !off =
>     if IMap.null w
>       then if off > endOff
>              then False
>              else case IMap.findWithDefault o (fromEnum (input `S.unsafeIndex` off)) t of
>                     Transition {trans_many=DFA {d_dt=dt'}} -> 
>                       multi dt' (succ off)
>       else True

All the above regex search does is look for a match.
It is a tester, returning Bool.
The "w" is checked for winning, if it wins it returns True.
If not winning, the offset is checked against the end (cached in endOff).
If the end has been reach it returns False.
Otherwise it looks up the Word8 at offset "off" in "input"
, looks up this Word8 as an Int in the t::IntMap Transition
, defaulting to Transition "o"
The next DT is followed with the (succ off), note that !off is strict.

I cannot think of how to reduce the code any further.  It should not be doing 
any allocation.  The Regex is being compiled once, cached, and run 10^6 times 
against a short piece of text.

I know it can run fast because I have huge module running in ST.Strict that does 
all kinds of extra computation and runs in 0.2 seconds.  It should be possible 
to cut this down and win.

So I also applied my delete key to most of the code in the complicated ST.Strict 
module, keeping the runST.  The end result was not unlike the above code, and 
ran in 4.0 seconds instead of 0.2 seconds.

I have looked hard at the core output.  I am at a loss.

So I have turned to haskell-cafe for insight and moral support.


More information about the Haskell-Cafe mailing list