[Haskell-cafe] Re: hamming distance allocation
Daniel Fischer
daniel.is.fischer at web.de
Mon Apr 19 08:36:48 EDT 2010
Am Montag 19 April 2010 14:13:53 schrieb Heinrich Apfelmus:
> Arnoldo Muller wrote:
> > I want to generate some hamming distance statistics about a set of
> > strings.
> >
> > filter (\x -> x /= 0) $
> > map (uncurry hammingX) [(xs, ys) | xs <- exampl, ys <- exampl]
> >
> > [...]
> >
> > -- function posted in this mailing list
> > hamming2 :: String -> String -> Int
> > hamming2 xs ys = length (filter not (zipWith (==) xs ys))
> >
> > I am executing these functions millions of times and the bottleneck of
> > my program is in them as explained by running in profiling mode with
> > +RTS -K400M -p -RTS
> >
> > The costlier function is the hamming distance
> > COST CENTRE MODULE %time %alloc
> >
> > hamming Distances 66.6 41.9
>
> Another way to look at it is that you shouldn't optimize hamming
> itself, but rather make sure that it's called less often!
>
> For instance, your expression can be replaced by
>
> filter (/=0) [hammingX x y | (x:xs) <- tails example, y <- xs]
>
> which cuts the total running time in half. It's still quadratic in the
> length of example . I'm sure there are faster algorithms out there that
If it's likely that there are many repetitions, collect the Strings in a
Set/Map (depending on whether you're interested in the count) and call
hamming only on the distinct pairs.
> can bring it down to O(n log n) if you want.
I don't think so. You can't calculate the Hamming distance of x and z from
the distances between x and y and y and z, so you'd have to consider all
pairs of distinct strings, wouldn't you?
>
>
> Regards,
> Heinrich Apfelmus
More information about the Haskell-Cafe
mailing list