[Haskell-cafe] Re: hamming distance allocation
Heinrich Apfelmus
apfelmus at quantentunnel.de
Mon Apr 19 08:13:53 EDT 2010
Arnoldo Muller wrote:
> I want to generate some hamming distance statistics about a set of strings.
>
> filter (\x -> x /= 0) $
> map (uncurry hammingX) [(xs, ys) | xs <- exampl, ys <- exampl]
>
> [...]
>
> -- function posted in this mailing list
> hamming2 :: String -> String -> Int
> hamming2 xs ys = length (filter not (zipWith (==) xs ys))
>
> I am executing these functions millions of times and the bottleneck of my
> program is in them as explained by running in profiling mode with +RTS
> -K400M -p -RTS
>
> The costlier function is the hamming distance
> COST CENTRE MODULE %time %alloc
>
> hamming Distances 66.6 41.9
Another way to look at it is that you shouldn't optimize hamming
itself, but rather make sure that it's called less often!
For instance, your expression can be replaced by
filter (/=0) [hammingX x y | (x:xs) <- tails example, y <- xs]
which cuts the total running time in half. It's still quadratic in the
length of example . I'm sure there are faster algorithms out there that
can bring it down to O(n log n) if you want.
Regards,
Heinrich Apfelmus
--
http://apfelmus.nfshost.com
More information about the Haskell-Cafe
mailing list