[Haskell-beginners] Performance
Alex Dunlap
alexander.dunlap at gmail.com
Sun Nov 22 13:44:38 EST 2009
On Sun, Nov 22, 2009 at 05:59:04PM +0000, Philip Scott wrote:
> Hi again folks,
>
> I am still at it with my time-series problem, for those who haven't been
> following; I have a list of (time stamp, value) pairs and I need to do various
> bits and bobs with them. I have got arithmetic down pat now, thanks to the
> kind help of various members of the list - now I am looking at functions that
> look at some historical data in the time-series and do some work on that to
> give me an answer for a particular day.
>
> I have chosen to represent my time series in reverse date order, since non of
> the operations will ever want to look into the future, but often they would
> like to look in to the past.
>
> A function I would like to write is 'avg'. For a particular day, it computes
> the average of the values last 'n' points; if there are not n points to fetch,
> thee is no answer. I then combine those to make a new time series.
>
> e.g.
>
> If my input time series was
>
> [(5,10),(4,20),(3,30),(2,40), (1,50)]
>
> (Where 5, 4, 3, 2, 1 are timestamps and 10, 20, 30, 50, 50 are values)
>
> I would like the answer
>
> [(5,20), (4,30), (3,40)]
>
> (e.g. 20 = (10+20+30)/3 etc.. I can't get an answer for timestamps 2 and 1
> because there isn't enough historical data)
>
> So I have written some code to do this, and it works nicely enough; but it is
> _slow_. To do 1000 averages of different lengths on a series with only 3000
> points takes about 200 seconds on my (not overly shabby) laptop. The
> equivalent C program takes under a second.
>
> I am entirely sure that this is due to some failing on my part. I have been
> mucking around with the profiler all afternoon lazifying and delazifying
> various bits and bobs with no dramatic success so I thought I might put it to
> y'all if you don't mind!
>
> So here's some code. I've kept it quite general because there are a lot of
> functions I would like to implement that do similar things with bits of
> historical data.
>
> General comments on the Haskellyness/goodness of my code are welcomed as well,
> I'm still very much a beginner at this!
>
> --------- SNIP --------------
>
> -- Take n elements from a list if at least n exist
> takeMaybe n l | length l < n = Nothing
> | otherwise = Just $! (take n l)
>
> -- Little utility function, take a function f and apply it to the whole list,
> -- then the tail etc...
> lMap _ [] = []
> lMap f (x:xs) = (f (x:xs)):(lMap f xs)
>
> -- Little utility function to take a list containing Maybes and delete them
> -- Returning a list with the values inside the Just
> maybeListToList [] = []
> maybeListToList (x:xs) = maybe (maybeListToList xs)
> (\y -> y:(maybeListToList xs))
> x
>
> -- Return a list of lists, where each sublist is a list of the next n values
> histMaybe x = lMap (takeMaybe x)
> hist n x = maybeListToList $ histMaybe n x
>
> -- Take a function which works on a list of things and apply it only to a
> -- list of the second elements in a list of tuples 'l'.
> applyToValues f l = let (ts,vs) = unzip l
> in zip ts $ f vs
>
> -- Create a timeseries with the cumulative sum of the last n values
> cumL n l = map sum (hist n l)
> cum = applyToValues . cumL
>
> -- Creates a timeseries with the average of the last n values
> avgL n l = map ((*) (1/fromIntegral(n))) $ cumL n l
> avg = applyToValues . avgL
>
>
> --------- SNIP --------------
>
> According to the profiler (log attached), the vast majority of the time is
> spent in takeMaybe, presumably allocating and deallocating enormous amounts of
> memory for each of my little temporary sublists. I have tried liberally
> sprinkling $! and 'seq' about, thinking that might help but I am clearly not
> doing it right.
>
> Perhaps list is the wrong basic data structure for what I am doing?
>
> I hope I didn't bore you with that rather long email, I will leave it at that.
> If it would be useful, I could give you the complete program with a data set
> if anyone is keen enough to try for themselves.
>
> Thanks,
>
> Philip
>
> Sun Nov 22 17:28 2009 Time and Allocation Profiling Report (Final)
>
> test +RTS -p -hc -RTS
>
> total time = 162.98 secs (8149 ticks @ 20 ms)
> total alloc = 47,324,561,080 bytes (excludes profiling overheads)
>
> COST CENTRE MODULE %time %alloc
>
> takeMaybe Main 62.2 45.9
> cumL Main 36.2 52.4
>
>
> individual inherited
> COST CENTRE MODULE no. entries %time %alloc %time %alloc
>
> MAIN MAIN 1 0 0.0 0.0 100.0 100.0
> main Main 297 0 0.0 0.0 0.0 0.0
> readCurve TsdbFile 298 0 0.0 0.0 0.0 0.0
> CAF Main 260 4 0.0 0.0 100.0 100.0
> avgL Main 281 1 0.0 0.0 0.2 0.2
> cumL Main 282 1 0.1 0.1 0.2 0.2
> hist Main 283 1 0.0 0.0 0.1 0.1
> histMaybe Main 285 1 0.0 0.0 0.1 0.1
> takeMaybe Main 287 2543 0.1 0.1 0.1 0.1
> lMap Main 286 2544 0.0 0.0 0.0 0.0
> maybeListToList Main 284 2544 0.0 0.0 0.0 0.0
> avg Main 276 1 0.0 0.0 0.0 0.0
> applyToValues Main 277 1 0.0 0.0 0.0 0.0
> main Main 266 1 0.0 0.0 99.8 99.8
> avg Main 288 0 0.2 0.2 99.1 99.1
> avgL Main 290 0 0.0 0.0 98.8 98.4
> cumL Main 291 0 36.1 52.3 98.8 98.4
> hist Main 292 999 0.0 0.0 62.6 46.1
> histMaybe Main 294 999 0.0 0.0 62.4 46.0
> takeMaybe Main 296 1542456 62.1 45.8 62.1 45.8
> lMap Main 295 1542456 0.3 0.2 0.3 0.2
> maybeListToList Main 293 1542456 0.2 0.1 0.2 0.1
> applyToValues Main 289 999 0.2 0.5 0.2 0.5
> @+ Main 272 1000 0.0 0.0 0.6 0.6
> mergeStep Main 275 1000 0.3 0.2 0.4 0.4
> v Main 300 0 0.0 0.1 0.0 0.1
> t Main 299 0 0.1 0.1 0.1 0.1
> add Main 273 1000 0.0 0.0 0.1 0.2
> binaryValueFunc Main 274 1544001 0.1 0.2 0.1 0.2
> sendCurve GuiLink 268 1 0.0 0.0 0.1 0.0
> putCurve GuiLink 271 1545 0.1 0.0 0.1 0.0
> readCurve TsdbFile 267 1 0.0 0.0 0.0 0.0
> CAF Data.Typeable 258 1 0.0 0.0 0.0 0.0
> CAF GHC.IOBase 236 3 0.0 0.0 0.0 0.0
> CAF GHC.Read 234 1 0.0 0.0 0.0 0.0
> CAF GHC.Float 233 1 0.0 0.0 0.0 0.0
> CAF Text.Read.Lex 227 6 0.0 0.0 0.0 0.0
> CAF GHC.Int 222 1 0.0 0.0 0.0 0.0
> CAF Data.HashTable 213 2 0.0 0.0 0.0 0.0
> CAF GHC.Handle 211 5 0.0 0.0 0.0 0.0
> main Main 279 0 0.0 0.0 0.0 0.0
> readCurve TsdbFile 280 0 0.0 0.0 0.0 0.0
> CAF GHC.Conc 210 1 0.0 0.0 0.0 0.0
> CAF System.Posix.Internals 192 1 0.0 0.0 0.0 0.0
> CAF TsdbFile 181 5 0.0 0.0 0.0 0.0
> getCurve TsdbFile 278 1 0.0 0.0 0.0 0.0
> CAF Data.Binary.IEEE754 180 6 0.0 0.0 0.0 0.0
> CAF Data.Binary.Get 179 2 0.0 0.0 0.0 0.0
> CAF Data.Binary.Put 151 1 0.0 0.0 0.0 0.0
> CAF GuiLink 145 2 0.0 0.0 0.0 0.0
> CAF Network 144 1 0.0 0.0 0.0 0.0
> CAF Network.Socket 143 5 0.0 0.0 0.0 0.0
> main Main 269 0 0.0 0.0 0.0 0.0
> sendCurve GuiLink 270 0 0.0 0.0 0.0 0.0
> CAF Network.BSD 139 1 0.0 0.0 0.0 0.0
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
Without a dataset, I don't know if this is any faster than what you have, but I think it's a fair bit prettier, so you might have more luck starting with this:
-- | windows 3 [1..5] = [[1,2,3],[2,3,4],[3,4,5]]
windows :: Int -> [a] -> [[a]]
windows n xs = foldr (zipWith (:)) (repeat []) (take n (iterate (drop 1) xs))
and then averaging each list.
Hope that helps.
Alex
More information about the Beginners
mailing list