how much can %alloc in profiling be trusted
Hal Daume
t-hald@microsoft.com
Thu, 10 Jul 2003 08:05:54 -0700
Hi Simon, All,
By SCC'ing the code myself, it seems that most of the allocation is
coming from the SCCs dot4 and size'2 in the following:
cosine :: Vector -> Vector -> Double
cosine v1 v2 =3D=20
case dot v1 v2 0 of
n | n <=3D 0 -> 0
| otherwise -> n / (size v1 * size v2)
where=20
dot [] _ n =3D {-# SCC "dot1" #-} n
dot _ [] n =3D {-# SCC "dot2" #-} n
dot xl@((x,xv):xs) yl@((y,yv):ys) n =3D
case x `compare` y of
EQ -> {-# SCC "dot3" #-} (dot xs ys $! (xv * yv + n))
LT -> {-# SCC "dot4" #-} (dot xs yl n)
GT -> {-# SCC "dot5" #-} (dot xl ys n)
size l =3D sqrt $! size' l 0
size' [] n =3D {-# SCC "size'1" #-} n
size' ((_,x):xs) n =3D {-# SCC "size'2" #-} (size' xs $! (n + x*x))
The only reason dot4 dominates rather than dot5 is that it's called
about 30 times more frequently (by chance). Looking at the simpl-core
from -ddump-simpl, and not knowing too much about how to read it, it
vaguely looks to me light GHC has un-tail-recursified the dot function:
(in definition of Main.$wdot)
case GHC.Prim.<# x# y# of wild5 {
GHC.Base.True ->
__scc {dot4 Main}
case Main.$wdot a3 wild1 ww of ww1 { __DEFAULT ->
GHC.Float.D# ww1
};
Now I know the semantics of core is a bit different than Haskell, but
this seems to no longer be tail recursive (it needs to rebox the Double#
after the recursive call). Am I reading this correctly?
- Hal
--
Hal Daume III | hdaume@isi.edu
"Arrest this man, he talks in maths." | www.isi.edu/~hdaume
> -----Original Message-----
> From: Simon Peyton-Jones=20
> Sent: Thursday, July 10, 2003 2:38 AM
> To: Hal Daume; glasgow-haskell-users@haskell.org
> Subject: RE: how much can %alloc in profiling be trusted
>=20
>=20
> It should be accurate. Hard to say more without=20
> investigating your program in detail.
>=20
> Try -ddump-simpl (with your profiling flags) to see how your=20
> function looks just before code generation.
>=20
> Simoin
>=20
> | -----Original Message-----
> | From: glasgow-haskell-users-admin@haskell.org=20
> [mailto:glasgow-haskell-users-admin@haskell.org]
> | On Behalf Of Hal Daume
> | Sent: 09 July 2003 23:17
> | To: glasgow-haskell-users@haskell.org
> | Subject: how much can %alloc in profiling be trusted
> |=20
> | In my program, I get the following time/allocation information for a
> | call to my cosine function:
> |=20
> | individual inherited
> | COST CENTRE no. entries %time %alloc %time %alloc
> | cosine 2721 43.1 74.3 43.1 74.3
> |=20
> | this is a shocking amount of allocation, considering the=20
> definition of
> | the function:
> |=20
> | cosine :: Vector -> Vector -> Double
> | cosine v1 v2 =3D
> | case dot v1 v2 0 of
> | n | n <=3D 0 -> 0
> | | otherwise -> n / (size v1 * size v2)
> | where
> | dot [] _ n =3D n
> | dot _ [] n =3D n
> | dot ((x,xv):xs) ((y,yv):ys) n =3D
> | case x `compare` y of
> | EQ -> dot xs ys $! (xv * yv + n)
> | LT -> dot xs ((y,yv):ys) n
> | GT -> dot ((x,xv):xs) ys n
> | -- size =3D sqrt . sum . map (square . snd)
> | size l =3D sqrt $! size' l 0
> | size' [] n =3D n
> | size' ((_,x):xs) n =3D size' xs $! n + x*x
> |=20
> | where Vector =3D [(Int,Double)] is a sparse vector=20
> representation. This
> | was even higher (moderately) until I switched from the old=20
> to the new
> | definition of size listed there.
> |=20
> | You can't blame this on the fact that the two vectors are=20
> being passed
> | lazily either: they're being read strictly from a file and=20
> even seq'd
> | before being put into the list. Specifically, we have (v0=20
> is passed in
> | to the function from top-level):
> |=20
> | ... do
> | v <- readList_H h
> | return (cosine v0 v, w)
> |=20
> | where readList_H is defined as:
> |=20
> | readList_H h =3D do
> | b <- FastIO.isEOF h
> | c <- FastIO.fscanfChar h
> | if c /=3D ' '
> | then return []
> | else do
> | b <- FastIO.isEOF h
> | i <- FastIO.fscanfInt h
> | FastIO.fscanfChar h
> | v <- FastIO.fscanfFloat h
> | rest <- readList_H h
> | return ((force (force i,force $ floatToDouble v)) : rest)
> | where force x =3D x `seq` x
> |=20
> | as far as i can tell, all the list allocation should be=20
> happening here.
> | (the forces were not there in the beginning -- I added them=20
> later but it
> | changed nothing.)
> |=20
> | - Hal
> |=20
> | --
> | Hal Daume III | hdaume@isi.edu
> | "Arrest this man, he talks in maths." |=20
www.isi.edu/~hdaume
| _______________________________________________
| Glasgow-haskell-users mailing list
| Glasgow-haskell-users@haskell.org
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-users