[Haskell-beginners] Performance
Philip Scott
haskell-beginners at foo.me.uk
Sun Nov 22 12:59:04 EST 2009
Hi again folks,
I am still at it with my time-series problem, for those who haven't been
following; I have a list of (time stamp, value) pairs and I need to do various
bits and bobs with them. I have got arithmetic down pat now, thanks to the
kind help of various members of the list - now I am looking at functions that
look at some historical data in the time-series and do some work on that to
give me an answer for a particular day.
I have chosen to represent my time series in reverse date order, since non of
the operations will ever want to look into the future, but often they would
like to look in to the past.
A function I would like to write is 'avg'. For a particular day, it computes
the average of the values last 'n' points; if there are not n points to fetch,
thee is no answer. I then combine those to make a new time series.
e.g.
If my input time series was
[(5,10),(4,20),(3,30),(2,40), (1,50)]
(Where 5, 4, 3, 2, 1 are timestamps and 10, 20, 30, 50, 50 are values)
I would like the answer
[(5,20), (4,30), (3,40)]
(e.g. 20 = (10+20+30)/3 etc.. I can't get an answer for timestamps 2 and 1
because there isn't enough historical data)
So I have written some code to do this, and it works nicely enough; but it is
_slow_. To do 1000 averages of different lengths on a series with only 3000
points takes about 200 seconds on my (not overly shabby) laptop. The
equivalent C program takes under a second.
I am entirely sure that this is due to some failing on my part. I have been
mucking around with the profiler all afternoon lazifying and delazifying
various bits and bobs with no dramatic success so I thought I might put it to
y'all if you don't mind!
So here's some code. I've kept it quite general because there are a lot of
functions I would like to implement that do similar things with bits of
historical data.
General comments on the Haskellyness/goodness of my code are welcomed as well,
I'm still very much a beginner at this!
--------- SNIP --------------
-- Take n elements from a list if at least n exist
takeMaybe n l | length l < n = Nothing
| otherwise = Just $! (take n l)
-- Little utility function, take a function f and apply it to the whole list,
-- then the tail etc...
lMap _ [] = []
lMap f (x:xs) = (f (x:xs)):(lMap f xs)
-- Little utility function to take a list containing Maybes and delete them
-- Returning a list with the values inside the Just
maybeListToList [] = []
maybeListToList (x:xs) = maybe (maybeListToList xs)
(\y -> y:(maybeListToList xs))
x
-- Return a list of lists, where each sublist is a list of the next n values
histMaybe x = lMap (takeMaybe x)
hist n x = maybeListToList $ histMaybe n x
-- Take a function which works on a list of things and apply it only to a
-- list of the second elements in a list of tuples 'l'.
applyToValues f l = let (ts,vs) = unzip l
in zip ts $ f vs
-- Create a timeseries with the cumulative sum of the last n values
cumL n l = map sum (hist n l)
cum = applyToValues . cumL
-- Creates a timeseries with the average of the last n values
avgL n l = map ((*) (1/fromIntegral(n))) $ cumL n l
avg = applyToValues . avgL
--------- SNIP --------------
According to the profiler (log attached), the vast majority of the time is
spent in takeMaybe, presumably allocating and deallocating enormous amounts of
memory for each of my little temporary sublists. I have tried liberally
sprinkling $! and 'seq' about, thinking that might help but I am clearly not
doing it right.
Perhaps list is the wrong basic data structure for what I am doing?
I hope I didn't bore you with that rather long email, I will leave it at that.
If it would be useful, I could give you the complete program with a data set
if anyone is keen enough to try for themselves.
Thanks,
Philip
-------------- next part --------------
Sun Nov 22 17:28 2009 Time and Allocation Profiling Report (Final)
test +RTS -p -hc -RTS
total time = 162.98 secs (8149 ticks @ 20 ms)
total alloc = 47,324,561,080 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
takeMaybe Main 62.2 45.9
cumL Main 36.2 52.4
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 1 0 0.0 0.0 100.0 100.0
main Main 297 0 0.0 0.0 0.0 0.0
readCurve TsdbFile 298 0 0.0 0.0 0.0 0.0
CAF Main 260 4 0.0 0.0 100.0 100.0
avgL Main 281 1 0.0 0.0 0.2 0.2
cumL Main 282 1 0.1 0.1 0.2 0.2
hist Main 283 1 0.0 0.0 0.1 0.1
histMaybe Main 285 1 0.0 0.0 0.1 0.1
takeMaybe Main 287 2543 0.1 0.1 0.1 0.1
lMap Main 286 2544 0.0 0.0 0.0 0.0
maybeListToList Main 284 2544 0.0 0.0 0.0 0.0
avg Main 276 1 0.0 0.0 0.0 0.0
applyToValues Main 277 1 0.0 0.0 0.0 0.0
main Main 266 1 0.0 0.0 99.8 99.8
avg Main 288 0 0.2 0.2 99.1 99.1
avgL Main 290 0 0.0 0.0 98.8 98.4
cumL Main 291 0 36.1 52.3 98.8 98.4
hist Main 292 999 0.0 0.0 62.6 46.1
histMaybe Main 294 999 0.0 0.0 62.4 46.0
takeMaybe Main 296 1542456 62.1 45.8 62.1 45.8
lMap Main 295 1542456 0.3 0.2 0.3 0.2
maybeListToList Main 293 1542456 0.2 0.1 0.2 0.1
applyToValues Main 289 999 0.2 0.5 0.2 0.5
@+ Main 272 1000 0.0 0.0 0.6 0.6
mergeStep Main 275 1000 0.3 0.2 0.4 0.4
v Main 300 0 0.0 0.1 0.0 0.1
t Main 299 0 0.1 0.1 0.1 0.1
add Main 273 1000 0.0 0.0 0.1 0.2
binaryValueFunc Main 274 1544001 0.1 0.2 0.1 0.2
sendCurve GuiLink 268 1 0.0 0.0 0.1 0.0
putCurve GuiLink 271 1545 0.1 0.0 0.1 0.0
readCurve TsdbFile 267 1 0.0 0.0 0.0 0.0
CAF Data.Typeable 258 1 0.0 0.0 0.0 0.0
CAF GHC.IOBase 236 3 0.0 0.0 0.0 0.0
CAF GHC.Read 234 1 0.0 0.0 0.0 0.0
CAF GHC.Float 233 1 0.0 0.0 0.0 0.0
CAF Text.Read.Lex 227 6 0.0 0.0 0.0 0.0
CAF GHC.Int 222 1 0.0 0.0 0.0 0.0
CAF Data.HashTable 213 2 0.0 0.0 0.0 0.0
CAF GHC.Handle 211 5 0.0 0.0 0.0 0.0
main Main 279 0 0.0 0.0 0.0 0.0
readCurve TsdbFile 280 0 0.0 0.0 0.0 0.0
CAF GHC.Conc 210 1 0.0 0.0 0.0 0.0
CAF System.Posix.Internals 192 1 0.0 0.0 0.0 0.0
CAF TsdbFile 181 5 0.0 0.0 0.0 0.0
getCurve TsdbFile 278 1 0.0 0.0 0.0 0.0
CAF Data.Binary.IEEE754 180 6 0.0 0.0 0.0 0.0
CAF Data.Binary.Get 179 2 0.0 0.0 0.0 0.0
CAF Data.Binary.Put 151 1 0.0 0.0 0.0 0.0
CAF GuiLink 145 2 0.0 0.0 0.0 0.0
CAF Network 144 1 0.0 0.0 0.0 0.0
CAF Network.Socket 143 5 0.0 0.0 0.0 0.0
main Main 269 0 0.0 0.0 0.0 0.0
sendCurve GuiLink 270 0 0.0 0.0 0.0 0.0
CAF Network.BSD 139 1 0.0 0.0 0.0 0.0
More information about the Beginners
mailing list