[Haskell-cafe] hamming distance allocation
John Lato
jwlato at gmail.com
Mon Apr 19 08:37:33 EDT 2010
> Subject: Re: [Haskell-cafe] hamming distance allocation
>
> Am Montag 19 April 2010 01:03:14 schrieb Arnoldo Muller:
>> Hello all:
>>
>> I want to generate some hamming distance statistics about a set of
>> strings. As explained in another e-mail in this list, I used the
>> following code to call the
>> functions:
>> (exampl holds the list of strings of size w)
>> filter (\x -> x /= 0) $ map (uncurry hammingX) [(xs, ys) | xs <- exampl,
>> ys <- exampl]
>>
>> I have two hamming functions:
>> -- hamming distance for variable length strings
>> hamming :: String -> String -> Int
>> hamming x y = hamming' x y 0
>> where
>> hamming' [] _ !c = c
>> hamming' _ [] !c = c
>> hamming' (x:xs) (y:ys) !c
>> | x == y = hamming' xs ys c
>> | otherwise = hamming' xs ys (c + 1)
>>
>> -- function posted in this mailing list
>> hamming2 :: String -> String -> Int
>> hamming2 xs ys = length (filter not (zipWith (==) xs ys))
>>
>> I am executing these functions millions of times and the bottleneck of
>> my program is in them as explained by running in profiling mode with
>> +RTS -K400M -p -RTS
>>
>> The costlier function is the hamming distance
>> COST CENTRE MODULE %time %alloc
>>
>> hamming Distances 66.6 41.9
>>
>> It says that it is performing 41% of the allocations. In the case of
>> hamming2 the allocations go as far as 52%.
>
> Allocations are cheap, so that's not necessarily a problem. More important
> is, what's the maximum residency and how much is copied during GC?
> Are you compiling with -O2 ?
>
>> I could understand that
>> there are allocations in "hamming2" because we are creating pairs, but
>> in the case of "hamming" there should be no allocation.
>
> Why not? I don't know how GHC counts allocations, but everytime you go from
> (x:xs) to xs, you need a new pointer to the tail. If that counts as
> allocation, hamming must allocate a lot, too.
Is it really necessary to use Strings? I think a packed type, e.g.
Vector or ByteString, would be much more efficient here. Of course
this is only likely to be a benefit if you can move away from String
entirely.
I suspect that "hamming2" would perform better then.
John
More information about the Haskell-Cafe
mailing list