[Haskell-cafe] hamming distance allocation

Daniel Fischer daniel.is.fischer at web.de
Sun Apr 18 21:18:56 EDT 2010


Am Montag 19 April 2010 01:03:14 schrieb Arnoldo Muller:
> Hello all:
>
> I want to generate some hamming distance statistics about a set of
> strings. As explained in another e-mail in this list, I used the
> following code to call the
> functions:
> (exampl holds the list of strings of size w)
> filter (\x -> x /= 0) $ map (uncurry hammingX) [(xs, ys) | xs <- exampl,
> ys <- exampl]
>
> I have two hamming functions:
> -- hamming distance for variable length strings
> hamming :: String -> String -> Int
> hamming x y = hamming' x y 0
>     where
>       hamming' [] _ !c = c
>       hamming' _ [] !c = c
>       hamming' (x:xs) (y:ys) !c
>           | x == y = hamming' xs ys c
>           | otherwise = hamming' xs ys (c + 1)
>
> -- function posted in this mailing list
> hamming2 :: String -> String -> Int
> hamming2 xs ys = length (filter not (zipWith (==) xs ys))
>
> I am executing these functions millions of times and the bottleneck of
> my program is in them as explained by running in profiling mode with 
> +RTS -K400M -p -RTS
>
> The costlier function is the hamming distance
> COST CENTRE                    MODULE               %time %alloc
>
> hamming                        Distances             66.6   41.9
>
> It says that it is performing 41% of the allocations. In the case of
> hamming2 the allocations go as far as 52%.

Allocations are cheap, so that's not necessarily a problem. More important 
is, what's the maximum residency and how much is copied during GC?
Are you compiling with -O2 ?

> I could understand that
> there are allocations in "hamming2" because we are creating pairs, but
> in the case of "hamming" there should be no allocation.

Why not? I don't know how GHC counts allocations, but everytime you go from 
(x:xs) to xs, you need a new pointer to the tail. If that counts as 
allocation, hamming must allocate a lot, too.

>
> How can I execute my hamming functions without allocating memory?
>
> Best regards,
>
> Arnoldo Muller



More information about the Haskell-Cafe mailing list