[Haskell-cafe] 3GB allocation, 75% CPU usage from a callback function
Neil Brown
nccb2 at kent.ac.uk
Sat Jan 29 18:22:10 CET 2011
Hi,
First of all, don't be fooled by the alloc statistic. That is not 3GB
memory residency, that's 3GB allocation, which was interspersed with
lots of garbage collections, in the same way that measuring how many
times malloc was called in a C program doesn't necessarily indicate
memory residency. Using +RTS -s, it looks like your program uses around
10MB at any one time. As for the speed, your program is doing a lot of
conversions that aren't necessary. CFloat has the Num and Floating
instances necessary to use sin, so you're better off making everything a
CFloat, rather than converting to and from Float. I took your program
and ironed it out a bit (you were also using an extra readIORef as part
of the modifyIORef that you didn't need), and used Criterion to test the
speed. Here's the program in its entirety (you'll need to "cabal
install criterion"):
===
import Data.IORef
import Foreign.C.Types
import Criterion.Main
newSinWave :: Int -> Float -> IO (CFloat -> IO CFloat)
newSinWave sampleRate freq =
do ioref <- newIORef (0::Integer)
let multiplier = 2 * pi * freq /
(fromIntegral sampleRate)
return (\ _ -> {-# SCC "sinWave" #-}
do t <- readIORef ioref
modifyIORef ioref (+1)
return $ fromRational $
toRational $
sin (fromIntegral t *
multiplier))
newSinWave' :: Int -> Float -> IO (CFloat -> IO CFloat)
newSinWave' sampleRate freq =
do ioref <- newIORef 0
let multiplier = 2 * pi * (realToFrac freq) /
(fromIntegral sampleRate)
return (\ _ -> {-# SCC "sinWave'" #-}
do t <- readIORef ioref
writeIORef ioref (t+1)
return $ sin (t * multiplier))
runLots :: (a -> IO a) -> a -> IO a
runLots f = go 10000
where
go 0 !x = return x
go n !x = f x >>= go (n - 1)
main :: IO ()
main = do f <- newSinWave 44100 100
g <- newSinWave' 44100 100
defaultMain [bench "old" $ runLots f 0, bench "new" $ runLots
g 0]
===
And here's the output from Criterion on my machine, compiled with
-XBangPatterns -O1 -rtsopts:
===
benchmarking old
collecting 100 samples, 1 iterations each, in estimated 10.54111 s
bootstrapping with 100000 resamples
mean: 116.4734 ms, lb 116.2565 ms, ub 117.1492 ms, ci 0.950
std dev: 1.794715 ms, lb 626.6683 us, ub 3.992824 ms, ci 0.950
found 5 outliers among 100 samples (5.0%)
1 (1.0%) low severe
3 (3.0%) high mild
1 (1.0%) high severe
variance introduced by outliers: 0.993%
variance is unaffected by outliers
benchmarking new
collecting 100 samples, 2 iterations each, in estimated 1.417208 s
bootstrapping with 100000 resamples
mean: 10.33277 ms, lb 10.15559 ms, ub 10.50883 ms, ci 0.950
std dev: 904.9297 us, lb 845.3293 us, ub 973.6881 us, ci 0.950
variance introduced by outliers: 1.000%
variance is unaffected by outliers
===
So unless I've done something wrong in the methodology (always
possible), that's made it ten times faster. And here's the output from
+RTS -s:
===
6,458,290,512 bytes allocated in the heap
10,855,744 bytes copied during GC
5,522,696 bytes maximum residency (5 sample(s))
3,194,696 bytes maximum slop
13 MB total memory in use (0 MB lost due to fragmentation)
===
Hope that helps,
Neil.
On 29/01/2011 16:29, Edward Amsden wrote:
> I recently got the jack package from hackage working again. For those
> unfamiliar, jack is a callback-based audio server.
> Writing a client entails importing the C library or its bindings (the
> Sound.JACK module in my case), creating a client and
> some ports (using provided library functions), and then registering
> callbacks for audio processing.
>
> I've written a simple program that outputs a sine wave through JACK.
> The server's sample rate is 44100, which means that this function must
> be called 44100 times/second (it is buffered, so generally that would
> be in chunks of 64, 128, or 256). It is an IO function,
> which gives the only opportunity to keep track of time:
>
> (Note that the function produced by newSinWave is the one actually
> registered as a callback:
>
> newSinWave :: Int -> Float -> IO (CFloat -> IO CFloat)
> newSinWave sampleRate freq =
> do ioref<- newIORef (0::Integer)
> let multiplier = 2 * pi * freq /
> (fromIntegral sampleRate)
>
> return (\_ -> {-# SCC "sinWave" #-}
> do t<- readIORef ioref
> modifyIORef ioref (+1)
> return $ fromRational $
> toRational $
> sin (fromIntegral t *
> multiplier))
>
> I profiled this since when my program registered with the jack server
> and started taking callbacks, it was using about 75% cpu
> (in contrast, the echo program included with the jack package uses
> less than 2%). The following two lines are of concern to me:
>
> "total alloc = 3,040,397,164 bytes (excludes profiling overheads)"
> My program uses 3GB of virtual memory over a 15 second run?
>
>
> " sinWave Main 341 1173295 100.0 100.0 0.0 0.0"
> and ~100% of that 75% cpu time is being spent in my callback.
>
> Is there something I'm doing wrong? At the very least, it shouldn't be
> using 3GB of memory. The only thing that needs to be saved between
> callbacks is the IORef, which is storing an Int. I assume that
> evaluating that whole construct in haskell may be too much timewise to
> put in
> a sound callback (or perhaps not), but 3GB of memory is ridiculous.
>
> Thoughts/hints/"you're doing it wrong" anyone?
>
More information about the Haskell-Cafe
mailing list