[Haskell-cafe] hamming distance allocation
arnoldomuller at gmail.com
Mon Apr 19 11:14:58 EDT 2010
Well I could use a packed type. The only letters that will be found in the
string are "ATCG" so yeah I don't need unicode and those things.
Will try out with vector or ByteString. Thanks! :)
On Mon, Apr 19, 2010 at 2:37 PM, John Lato <jwlato at gmail.com> wrote:
> > Subject: Re: [Haskell-cafe] hamming distance allocation
> > Am Montag 19 April 2010 01:03:14 schrieb Arnoldo Muller:
> >> Hello all:
> >> I want to generate some hamming distance statistics about a set of
> >> strings. As explained in another e-mail in this list, I used the
> >> following code to call the
> >> functions:
> >> (exampl holds the list of strings of size w)
> >> filter (\x -> x /= 0) $ map (uncurry hammingX) [(xs, ys) | xs <- exampl,
> >> ys <- exampl]
> >> I have two hamming functions:
> >> -- hamming distance for variable length strings
> >> hamming :: String -> String -> Int
> >> hamming x y = hamming' x y 0
> >> where
> >> hamming'  _ !c = c
> >> hamming' _  !c = c
> >> hamming' (x:xs) (y:ys) !c
> >> | x == y = hamming' xs ys c
> >> | otherwise = hamming' xs ys (c + 1)
> >> -- function posted in this mailing list
> >> hamming2 :: String -> String -> Int
> >> hamming2 xs ys = length (filter not (zipWith (==) xs ys))
> >> I am executing these functions millions of times and the bottleneck of
> >> my program is in them as explained by running in profiling mode with
> >> +RTS -K400M -p -RTS
> >> The costlier function is the hamming distance
> >> COST CENTRE MODULE %time %alloc
> >> hamming Distances 66.6 41.9
> >> It says that it is performing 41% of the allocations. In the case of
> >> hamming2 the allocations go as far as 52%.
> > Allocations are cheap, so that's not necessarily a problem. More
> > is, what's the maximum residency and how much is copied during GC?
> > Are you compiling with -O2 ?
> >> I could understand that
> >> there are allocations in "hamming2" because we are creating pairs, but
> >> in the case of "hamming" there should be no allocation.
> > Why not? I don't know how GHC counts allocations, but everytime you go
> > (x:xs) to xs, you need a new pointer to the tail. If that counts as
> > allocation, hamming must allocate a lot, too.
> Is it really necessary to use Strings? I think a packed type, e.g.
> Vector or ByteString, would be much more efficient here. Of course
> this is only likely to be a benefit if you can move away from String
> I suspect that "hamming2" would perform better then.
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Haskell-Cafe