[Haskell-cafe] Re: hamming distance allocation

Heinrich Apfelmus apfelmus at quantentunnel.de
Mon Apr 19 08:13:53 EDT 2010


Arnoldo Muller wrote:
> I want to generate some hamming distance statistics about a set of strings.
>
>   filter (\x -> x /= 0) $
>     map (uncurry hammingX) [(xs, ys) | xs <- exampl, ys <- exampl]
> 
> [...]
>
> -- function posted in this mailing list
> hamming2 :: String -> String -> Int
> hamming2 xs ys = length (filter not (zipWith (==) xs ys))
> 
> I am executing these functions millions of times and the bottleneck of my
> program is in them as explained by running in profiling mode with  +RTS
> -K400M -p -RTS
> 
> The costlier function is the hamming distance
> COST CENTRE                    MODULE               %time %alloc
> 
> hamming                        Distances             66.6   41.9

Another way to look at it is that you shouldn't optimize  hamming
itself, but rather make sure that it's called less often!

For instance, your expression can be replaced by

   filter (/=0) [hammingX x y | (x:xs) <- tails example, y <- xs]

which cuts the total running time in half. It's still quadratic in the
length of  example . I'm sure there are faster algorithms out there that
can bring it down to O(n log n) if you want.


Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com



More information about the Haskell-Cafe mailing list