daniel.is.fischer at web.de
Sun Nov 22 13:32:26 EST 2009
Am Sonntag 22 November 2009 18:59:04 schrieb Philip Scott:
> Hi again folks,
> I am still at it with my time-series problem, for those who haven't been
> following; I have a list of (time stamp, value) pairs and I need to do
> various bits and bobs with them. I have got arithmetic down pat now, thanks
> to the kind help of various members of the list - now I am looking at
> functions that look at some historical data in the time-series and do some
> work on that to give me an answer for a particular day.
> I have chosen to represent my time series in reverse date order, since non
> of the operations will ever want to look into the future, but often they
> would like to look in to the past.
> A function I would like to write is 'avg'. For a particular day, it
> computes the average of the values last 'n' points; if there are not n
> points to fetch, thee is no answer. I then combine those to make a new time
> If my input time series was
> [(5,10),(4,20),(3,30),(2,40), (1,50)]
> (Where 5, 4, 3, 2, 1 are timestamps and 10, 20, 30, 50, 50 are values)
> I would like the answer
> [(5,20), (4,30), (3,40)]
> (e.g. 20 = (10+20+30)/3 etc.. I can't get an answer for timestamps 2 and 1
> because there isn't enough historical data)
> So I have written some code to do this, and it works nicely enough; but it
> is _slow_. To do 1000 averages of different lengths on a series with only
> 3000 points takes about 200 seconds on my (not overly shabby) laptop. The
> equivalent C program takes under a second.
> I am entirely sure that this is due to some failing on my part. I have been
> mucking around with the profiler all afternoon lazifying and delazifying
> various bits and bobs with no dramatic success so I thought I might put it
> to y'all if you don't mind!
> So here's some code. I've kept it quite general because there are a lot of
> functions I would like to implement that do similar things with bits of
> historical data.
> General comments on the Haskellyness/goodness of my code are welcomed as
> well, I'm still very much a beginner at this!
> --------- SNIP --------------
> -- Take n elements from a list if at least n exist
> takeMaybe n l | length l < n = Nothing
> | otherwise = Just $! (take n l)
Ouch, that makes your algorithm quadratic already.
Checking "length l < n" must trverse the entire list:
3000 nodes + 2999 nodes + 2998 nodes + you get the idea.
takeMaybe n l
| null $ drop (n-1) l = Nothing
| otherwise = Just (take n l)
Or a variation,
case splitAt (n-1) l of
(a,h:t) -> Just (a ++ [h])
_ -> Nothing
(test which is faster, play with various sorts of strictness,...)
> -- Little utility function, take a function f and apply it to the whole
> list, -- then the tail etc...
> lMap _  = 
> lMap f (x:xs) = (f (x:xs)):(lMap f xs)
lMap f = map f . tails
(Data.List.tails and Data.List.inits are often useful, more idiomatic anyway)
> -- Little utility function to take a list containing Maybes and delete them
> -- Returning a list with the values inside the Just
> maybeListToList  = 
> maybeListToList (x:xs) = maybe (maybeListToList xs)
> (\y -> y:(maybeListToList xs))
Look at Data.Maybe.catMaybes
> -- Return a list of lists, where each sublist is a list of the next n
> values histMaybe x = lMap (takeMaybe x)
> hist n x = maybeListToList $ histMaybe n x
map (take n) $ takeWhile (not . null . drop (n-1)) $ tails xs
> -- Take a function which works on a list of things and apply it only to a
> -- list of the second elements in a list of tuples 'l'.
> applyToValues f l = let (ts,vs) = unzip l
> in zip ts $ f vs
> -- Create a timeseries with the cumulative sum of the last n values
> cumL n l = map sum (hist n l)
> cum = applyToValues . cumL
> -- Creates a timeseries with the average of the last n values
> avgL n l = map ((*) (1/fromIntegral(n))) $ cumL n l
map (/fromIntegral n), surely?
> avg = applyToValues . avgL
> --------- SNIP --------------
> According to the profiler (log attached), the vast majority of the time is
> spent in takeMaybe, presumably allocating and deallocating enormous amounts
> of memory for each of my little temporary sublists. I have tried liberally
> sprinkling $! and 'seq' about, thinking that might help but I am clearly
> not doing it right.
> Perhaps list is the wrong basic data structure for what I am doing?
> I hope I didn't bore you with that rather long email, I will leave it at
> that. If it would be useful, I could give you the complete program with a
> data set if anyone is keen enough to try for themselves.
More information about the Beginners