getting a grip on memory usage
Simon Peyton-Jones
simonpj@microsoft.com
Fri, 24 May 2002 03:12:53 -0700
You surely need a seq in loop' (not loop).
GHC can't see that it's strict, because it's overloaded.
Something like
| where loop' a pos | pos > high =3D a
| | otherwise =3D a1 `seq` p1 `seq` loop' a1 p1
where
a1 =3D f a pos
p1 =3D p+1
(The seq on p1 simply avoids the creation of a thunk for p+1;
the thunk will be evaluated on the next iteration, assuming '>'
is strict, so it won't be a leak.)
Or specialise loop to type Int (or whatever), which will
help the strictness analyser no end.
I'm not certain that this will do it, but if you don't do this
you are in deep trouble.
By 'eat memory' I assume you mean residency rather
than just allocation
Simon
| -----Original Message-----
| From: Hal Daume III [mailto:hdaume@ISI.EDU]=20
| Sent: 22 May 2002 18:49
| To: GHC Users Mailing List
| Subject: getting a grip on memory usage
|=20
|=20
| so i have a function that is eating *tons* of memory. my=20
| application is
| clustering and i put everything in UArrays or Arrays. for=20
| clustering 5
| data points with 4 features (a tiny tiny set) the program gobbles
| 350mbs! most of the usage is coming from this distance function:
|=20
| dist dat Nothing pq@(p,q) =3D=20
| (uncurry loop) (bounds xp) 0
| (\a i -> a + (sqr (xp!i - xq!i)))
| where Vector xp =3D dat !!! p
| Vector xq =3D dat !!! q
| dist dat (Just (Vector w)) pq@(p,q) =3D
| (uncurry loop) (bounds xp) 0
| (\a i -> a + ((sqr (w!i)) * (sqr (xp!i - xq!i))))
| where Vector xp =3D dat !!! p
| Vector xq =3D dat !!! q
|=20
| Where the relevant definitions are:
|=20
| type Vector =3D Vector (UArray Int Double)
| (!!!) =3D Array.(!)
|=20
| and loop is:
|=20
| loop :: (Num i, Ord i, Ix i) =3D> i -> i -> a -> (a -> i -> a) -> a
| loop low high a f =3D loop' a low
| where loop' a pos | pos > high =3D a
| | otherwise =3D loop' (f a pos) (pos+1)
|=20
| i cannot figure out why such a function would be eating so=20
| much memory. i
| even tried changing the function provided to loop with "a=20
| `seq` ..." to
| make sure we're not creating a huge thunk, but that didn't=20
| help at all.
|=20
| but, according to my profiling:
|=20
| dist Main 380000 27.7 23.6 =20
| 64.9 61.0
| sqr Main 2280000 1.6 7.9 =20
| 1.6 7.9
| !!! Main 760000 4.7 0.0 =20
| 4.7 0.0
| loop Loops 380000 30.8 29.5 =20
| 30.8 29.5
|=20
| i realize it's being entered a lot of times, and i can=20
| understand that it
| would use a bunch of time resources, but i don't understand=20
| why it's using
| so much space.
|=20
| any help would be appreciated...
|=20
| - Hal
|=20
| --
| Hal Daume III
|=20
| "Computer science is no more about computers | hdaume@isi.edu
| than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume
|=20
| _______________________________________________
| Glasgow-haskell-users mailing list
| Glasgow-haskell-users@haskell.org
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
|=20