how much can %alloc in profiling be trusted

Hal Daume t-hald@microsoft.com
Thu, 10 Jul 2003 08:05:54 -0700


Hi Simon, All,

By SCC'ing the code myself, it seems that most of the allocation is
coming from the SCCs dot4 and size'2 in the following:

cosine :: Vector -> Vector -> Double
cosine v1 v2 =3D=20
  case dot v1 v2 0 of
    n | n <=3D 0    -> 0
      | otherwise -> n / (size v1 * size v2)
  where=20
    dot [] _  n =3D {-# SCC "dot1" #-} n
    dot _  [] n =3D {-# SCC "dot2" #-} n
    dot xl@((x,xv):xs) yl@((y,yv):ys) n =3D
      case x `compare` y of
        EQ -> {-# SCC "dot3" #-} (dot xs ys $! (xv * yv + n))
        LT -> {-# SCC "dot4" #-} (dot xs yl n)
        GT -> {-# SCC "dot5" #-} (dot xl ys n)
    size l =3D sqrt $! size' l 0
    size' [] n =3D {-# SCC "size'1" #-} n
    size' ((_,x):xs) n =3D {-# SCC "size'2" #-} (size' xs $! (n + x*x))

The only reason dot4 dominates rather than dot5 is that it's called
about 30 times more frequently (by chance).  Looking at the simpl-core
from -ddump-simpl, and not knowing too much about how to read it, it
vaguely looks to me light GHC has un-tail-recursified the dot function:

(in definition of Main.$wdot)

  case GHC.Prim.<# x# y# of wild5 {
    GHC.Base.True ->
      __scc {dot4 Main}
      case Main.$wdot a3 wild1 ww of ww1 { __DEFAULT ->
        GHC.Float.D# ww1
    };

Now I know the semantics of core is a bit different than Haskell, but
this seems to no longer be tail recursive (it needs to rebox the Double#
after the recursive call).  Am I reading this correctly?

 - Hal

--
 Hal Daume III                                   | hdaume@isi.edu
 "Arrest this man, he talks in maths."           | www.isi.edu/~hdaume


> -----Original Message-----
> From: Simon Peyton-Jones=20
> Sent: Thursday, July 10, 2003 2:38 AM
> To: Hal Daume; glasgow-haskell-users@haskell.org
> Subject: RE: how much can %alloc in profiling be trusted
>=20
>=20
> It should be accurate.  Hard to say more without=20
> investigating your program in detail.
>=20
> Try -ddump-simpl (with your profiling flags) to see how your=20
> function looks just before code generation.
>=20
> Simoin
>=20
> | -----Original Message-----
> | From: glasgow-haskell-users-admin@haskell.org=20
> [mailto:glasgow-haskell-users-admin@haskell.org]
> | On Behalf Of Hal Daume
> | Sent: 09 July 2003 23:17
> | To: glasgow-haskell-users@haskell.org
> | Subject: how much can %alloc in profiling be trusted
> |=20
> | In my program, I get the following time/allocation information for a
> | call to my cosine function:
> |=20
> |                              individual      inherited
> | COST CENTRE   no. entries  %time  %alloc  %time  %alloc
> | cosine               2721  43.1   74.3    43.1   74.3
> |=20
> | this is a shocking amount of allocation, considering the=20
> definition of
> | the function:
> |=20
> | cosine :: Vector -> Vector -> Double
> | cosine v1 v2 =3D
> |   case dot v1 v2 0 of
> |     n | n <=3D 0    -> 0
> |       | otherwise -> n / (size v1 * size v2)
> |   where
> |     dot [] _  n =3D n
> |     dot _  [] n =3D n
> |     dot ((x,xv):xs) ((y,yv):ys) n =3D
> |       case x `compare` y of
> |         EQ -> dot xs ys $! (xv * yv + n)
> |         LT -> dot xs ((y,yv):ys) n
> |         GT -> dot ((x,xv):xs) ys n
> | --    size =3D sqrt . sum . map (square . snd)
> |     size l =3D sqrt $! size' l 0
> |     size' [] n =3D n
> |     size' ((_,x):xs) n =3D size' xs $! n + x*x
> |=20
> | where Vector =3D [(Int,Double)] is a sparse vector=20
> representation.  This
> | was even higher (moderately) until I switched from the old=20
> to the new
> | definition of size listed there.
> |=20
> | You can't blame this on the fact that the two vectors are=20
> being passed
> | lazily either: they're being read strictly from a file and=20
> even seq'd
> | before being put into the list.  Specifically, we have (v0=20
> is passed in
> | to the function from top-level):
> |=20
> |   ... do
> |     v <- readList_H h
> |     return (cosine v0 v, w)
> |=20
> | where readList_H is defined as:
> |=20
> | readList_H h =3D do
> |   b <- FastIO.isEOF h
> |   c <- FastIO.fscanfChar h
> |   if c /=3D ' '
> |     then return []
> |     else do
> |       b <- FastIO.isEOF h
> |       i <- FastIO.fscanfInt h
> |       FastIO.fscanfChar h
> |       v <- FastIO.fscanfFloat h
> |       rest <- readList_H h
> |       return ((force (force i,force $ floatToDouble v)) : rest)
> |   where force x =3D x `seq` x
> |=20
> | as far as i can tell, all the list allocation should be=20
> happening here.
> | (the forces were not there in the beginning -- I added them=20
> later but it
> | changed nothing.)
> |=20
> |  - Hal
> |=20
> | --
> |  Hal Daume III                                   | hdaume@isi.edu
> |  "Arrest this man, he talks in maths."           |=20
www.isi.edu/~hdaume
| _______________________________________________
| Glasgow-haskell-users mailing list
| Glasgow-haskell-users@haskell.org
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-users