[Haskell-cafe] How to improve the zipwith's performance

Fri Aug 8 03:24:29 UTC 2014

Dear All

I write a code for Clustering with Data.Clustering.Hierarchical, but it's
slow.

I use the profiling and change some code, but I don't know why zipwith take
so many time? (even I change list to vector)

My code is as blow, Any one kindly give me some advices.
======================
main = do
    ....
    let cluster = dendrogram  SingleLinkage vectorList getVectorDistance
    ....

getExp2 v1 v2 = d*d
    where
        d = v1 - v2

getExp v1 v2
    | v1 == v2 = 0
    | otherwise = getExp2 v1 v2

tfoldl  d = DV.foldl1' (+) d

changeDataType:: Int -> Double
changeDataType d = fromIntegral d

getVectorDistance::(a,DV.Vector Int)->(a, DV.Vector Int )->Double
getVectorDistance v1 v2 = fromIntegral $ tfoldl dat
    where
        l1 = snd v1
        l2 = snd v2
        dat = DV.zipWith getExp l1 l2

=======================================

build with ghc -prof -fprof-auto -rtsopts -O2 log_cluster.hs

run with  log_cluster.exe +RTS -p

profiling result is

 log_cluster.exe +RTS -p -RTS

    total time  =        8.43 secs   (8433 ticks @ 1000 us, 1 processor)
    total alloc = 1,614,252,224 bytes  (excludes profiling overheads)

COST CENTRE            MODULE  %time %alloc

getVectorDistance.dat  Main     49.4   37.8
tfoldl                 Main      5.7    0.0
getExp                 Main      4.5    0.0
getExp2                Main      0.5    1.5
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20140808/d29aa298/attachment.html>