[Haskell-cafe] looking for optimization advice

Thu Mar 25 15:26:21 EST 2004

On Thu, Mar 25, 2004 at 11:39:26AM -0800, Carl Witty wrote:
> On Thu, 2004-03-25 at 10:09, David Roundy wrote:
> > The function is a simple packed string compare, and I basically just call
> > the C standard library function memcmp for this.  Without further ado:
> 
> I don't know how to make your particular function faster, but I do have
> some ideas for different approaches you may not have thought of.
> 
> If you compare strings more often than you create strings, and the
> comparisons usually fail, you could add a hash of the string to your
> PackedString datatype; then psniceq could compare hash values first and
> only confirm equality with the memcmp if the hash values match.

In many cases, *very* large strings will be created and never compared, so
hashing them doesn't sound like a good plan.

On the other hand, it has occurred to me that although comparisons usually
fail, perhaps those that succeed are the ones that are taking all the time,
since those ones require checking the entire string (rather than just up to
the first difference).  So I've added a check whether the two strings have
the same pointer and offset prior to calling memcmp.  Due to other aspects
of how my PackedString's work, identical strings should "usually" be
represented by pointers to the same memory.

psniceq :: PackedString -> PackedString -> Bool
psniceq a@(PS x1 s1 l1) b@(PS x2 s2 l2)
    | nullPS a && nullPS b = True
psniceq (PS x1 s1 l1) (PS x2 s2 l2) =
    (l1 == l2 &&) $ unsafePerformIO $ withForeignPtr x1 $ \p1->
    withForeignPtr x2 $ \p2 ->
        if p1 == p2 && s1 == s2
        then return True
        else liftM (==0) $ c_memcmp (p1 `plusPtr` s1) (p2 `plusPtr` s2) l1

(Haven't yet tested the above, since I've been making other changes,
working on the *real* problem, which is why I'm comparing so many darn
strings...)
-- 
David Roundy
http://www.abridgegame.org