[Haskell-cafe] hamming distance allocation

Mon Apr 19 13:30:06 EDT 2010

Hello all:

I found my leak after adding some bang patterns in a different part of the
program. The compiler was generating all the combinations of the list
comprehensions and therefore the performance dropped very badly.

BTW, hamming is 2 times faster than hamming2.

Thank you as always!

Arnoldo

On Mon, Apr 19, 2010 at 5:53 PM, Arnoldo Muller <arnoldomuller at gmail.com>wrote:

> Hello Daniel:
>
> My % GC time is : 75.0%  (81.4% elapsed) and I am compiling with -O2.
> Thank you for clarifying about the pointers.
>
> Slowly my memory grows up and eventually it explodes. I would expect that
> the list comprehension is lazily evaluated and therefore at any given time I
> am only executing one hamming distance. The result of the hamming distance
> is stored into a small statistics datatype I built (only stores sums and sum
> of squares and the counts). This datatype is updated using a foldr.
>
> I have no idea where the leak is. What do you see in a .prof file to find a
> leak (hamming distance has the largest amount of time and %alloc)  ? From my
> .prof file where would you start looking at?
>
>
> Best Regards,
>
> Arnoldo Muller
>
>
>
>
> On Mon, Apr 19, 2010 at 3:18 AM, Daniel Fischer <daniel.is.fischer at web.de>wrote:
>
>> Am Montag 19 April 2010 01:03:14 schrieb Arnoldo Muller:
>> > Hello all:
>> >
>> > I want to generate some hamming distance statistics about a set of
>> > strings. As explained in another e-mail in this list, I used the
>> > following code to call the
>> > functions:
>> > (exampl holds the list of strings of size w)
>> > filter (\x -> x /= 0) $ map (uncurry hammingX) [(xs, ys) | xs <- exampl,
>> > ys <- exampl]
>> >
>> > I have two hamming functions:
>> > -- hamming distance for variable length strings
>> > hamming :: String -> String -> Int
>> > hamming x y = hamming' x y 0
>> >     where
>> >       hamming' [] _ !c = c
>> >       hamming' _ [] !c = c
>> >       hamming' (x:xs) (y:ys) !c
>> >           | x == y = hamming' xs ys c
>> >           | otherwise = hamming' xs ys (c + 1)
>> >
>> > -- function posted in this mailing list
>> > hamming2 :: String -> String -> Int
>> > hamming2 xs ys = length (filter not (zipWith (==) xs ys))
>> >
>> > I am executing these functions millions of times and the bottleneck of
>> > my program is in them as explained by running in profiling mode with
>> > +RTS -K400M -p -RTS
>> >
>> > The costlier function is the hamming distance
>> > COST CENTRE                    MODULE               %time %alloc
>> >
>> > hamming                        Distances             66.6   41.9
>> >
>> > It says that it is performing 41% of the allocations. In the case of
>> > hamming2 the allocations go as far as 52%.
>>
>> Allocations are cheap, so that's not necessarily a problem. More important
>> is, what's the maximum residency and how much is copied during GC?
>> Are you compiling with -O2 ?
>>
>> > I could understand that
>> > there are allocations in "hamming2" because we are creating pairs, but
>> > in the case of "hamming" there should be no allocation.
>>
>> Why not? I don't know how GHC counts allocations, but everytime you go
>> from
>> (x:xs) to xs, you need a new pointer to the tail. If that counts as
>> allocation, hamming must allocate a lot, too.
>>
>> >
>> > How can I execute my hamming functions without allocating memory?
>> >
>> > Best regards,
>> >
>> > Arnoldo Muller
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100419/3e887212/attachment.html