[Haskell-beginners] Performance

Philip Scott haskell-beginners at foo.me.uk
Sun Nov 22 12:59:04 EST 2009


Hi again folks, 

I am still at it with my time-series problem, for those who haven't been 
following; I have a list of (time stamp, value) pairs and I need to do various 
bits and bobs with them. I have got arithmetic down pat now, thanks to the 
kind help of various members of the list - now I am looking at functions that 
look at some historical data in the time-series and do some work on that to 
give me an answer for a particular day.

I have chosen to represent my time series in reverse date order, since non of 
the operations will ever want to look into the future, but often they would 
like to look in to the past.

A function I would like to write is 'avg'. For a particular day, it computes 
the average of the values last 'n' points; if there are not n points to fetch, 
thee is no answer. I then combine those to make a new time series.

e.g.

If my input time series was

[(5,10),(4,20),(3,30),(2,40), (1,50)]

(Where 5, 4, 3, 2, 1 are timestamps and 10, 20, 30, 50, 50 are values)

I would like the answer

[(5,20), (4,30), (3,40)]

(e.g. 20 = (10+20+30)/3 etc.. I can't get an answer for timestamps 2 and 1 
because there isn't enough historical data)

So I have written some code to do this, and it works nicely enough; but it is 
_slow_. To do 1000 averages of different lengths on a series with only 3000 
points takes about 200 seconds on my (not overly shabby) laptop. The 
equivalent C program takes under a second.

I am entirely sure that this is due to some failing on my part. I have been 
mucking around with the profiler all afternoon lazifying and delazifying 
various bits and bobs with no dramatic success so I thought I might put it to 
y'all if you don't mind!

So here's some code. I've kept it quite general because there are a lot of 
functions I would like to implement that do similar things with bits of 
historical data.

General comments on the Haskellyness/goodness of my code are welcomed as well, 
I'm still very much a beginner at this!

--------- SNIP --------------

-- Take n elements from a list if at least n exist
takeMaybe n l | length l < n = Nothing
              | otherwise    = Just $! (take n l)

-- Little utility function, take a function f and apply it to the whole list, 
-- then the tail etc...
lMap _ []     = []
lMap f (x:xs) = (f (x:xs)):(lMap f xs)

-- Little utility function to take a list containing Maybes and delete them
-- Returning a list with the values inside the Just
maybeListToList [] = []
maybeListToList (x:xs) = maybe (maybeListToList xs)
                               (\y -> y:(maybeListToList xs))
                               x

-- Return a list of lists, where each sublist is a list of the next n values
histMaybe x = lMap (takeMaybe x)
hist n x = maybeListToList $ histMaybe n x

-- Take a function which works on a list of things and apply it only to a
-- list of the second elements in a list of tuples 'l'.
applyToValues f l = let (ts,vs) = unzip l
                                in zip ts $ f vs

-- Create a timeseries with the cumulative sum of the last n values
cumL n l = map sum (hist n l)
cum = applyToValues . cumL                                                                                                            

-- Creates a timeseries with the average of the last n values
avgL n l = map ((*) (1/fromIntegral(n))) $ cumL n l
avg = applyToValues . avgL


--------- SNIP --------------

According to the profiler (log attached), the vast majority of the time is 
spent in takeMaybe, presumably allocating and deallocating enormous amounts of 
memory for each of my little temporary sublists. I have tried liberally 
sprinkling $! and 'seq' about, thinking that might help but I am clearly not 
doing it right.

Perhaps list is the wrong basic data structure for what I am doing?

I hope I didn't bore you with that rather long email, I will leave it at that. 
If it would be useful, I could give you the complete program with a data set 
if anyone is keen enough to try for themselves.

Thanks,

Philip

-------------- next part --------------
	Sun Nov 22 17:28 2009 Time and Allocation Profiling Report  (Final)

	   test +RTS -p -hc -RTS

	total time  =      162.98 secs   (8149 ticks @ 20 ms)
	total alloc = 47,324,561,080 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

takeMaybe                      Main                  62.2   45.9
cumL                           Main                  36.2   52.4


                                                                                               individual    inherited
COST CENTRE              MODULE                                               no.    entries  %time %alloc   %time %alloc

MAIN                     MAIN                                                   1           0   0.0    0.0   100.0  100.0
  main                   Main                                                 297           0   0.0    0.0     0.0    0.0
   readCurve             TsdbFile                                             298           0   0.0    0.0     0.0    0.0
 CAF                     Main                                                 260           4   0.0    0.0   100.0  100.0
  avgL                   Main                                                 281           1   0.0    0.0     0.2    0.2
   cumL                  Main                                                 282           1   0.1    0.1     0.2    0.2
    hist                 Main                                                 283           1   0.0    0.0     0.1    0.1
     histMaybe           Main                                                 285           1   0.0    0.0     0.1    0.1
      takeMaybe          Main                                                 287        2543   0.1    0.1     0.1    0.1
      lMap               Main                                                 286        2544   0.0    0.0     0.0    0.0
     maybeListToList     Main                                                 284        2544   0.0    0.0     0.0    0.0
  avg                    Main                                                 276           1   0.0    0.0     0.0    0.0
   applyToValues         Main                                                 277           1   0.0    0.0     0.0    0.0
  main                   Main                                                 266           1   0.0    0.0    99.8   99.8
   avg                   Main                                                 288           0   0.2    0.2    99.1   99.1
    avgL                 Main                                                 290           0   0.0    0.0    98.8   98.4
     cumL                Main                                                 291           0  36.1   52.3    98.8   98.4
      hist               Main                                                 292         999   0.0    0.0    62.6   46.1
       histMaybe         Main                                                 294         999   0.0    0.0    62.4   46.0
        takeMaybe        Main                                                 296     1542456  62.1   45.8    62.1   45.8
        lMap             Main                                                 295     1542456   0.3    0.2     0.3    0.2
       maybeListToList   Main                                                 293     1542456   0.2    0.1     0.2    0.1
    applyToValues        Main                                                 289         999   0.2    0.5     0.2    0.5
   @+                    Main                                                 272        1000   0.0    0.0     0.6    0.6
    mergeStep            Main                                                 275        1000   0.3    0.2     0.4    0.4
     v                   Main                                                 300           0   0.0    0.1     0.0    0.1
     t                   Main                                                 299           0   0.1    0.1     0.1    0.1
    add                  Main                                                 273        1000   0.0    0.0     0.1    0.2
     binaryValueFunc     Main                                                 274     1544001   0.1    0.2     0.1    0.2
   sendCurve             GuiLink                                              268           1   0.0    0.0     0.1    0.0
    putCurve             GuiLink                                              271        1545   0.1    0.0     0.1    0.0
   readCurve             TsdbFile                                             267           1   0.0    0.0     0.0    0.0
 CAF                     Data.Typeable                                        258           1   0.0    0.0     0.0    0.0
 CAF                     GHC.IOBase                                           236           3   0.0    0.0     0.0    0.0
 CAF                     GHC.Read                                             234           1   0.0    0.0     0.0    0.0
 CAF                     GHC.Float                                            233           1   0.0    0.0     0.0    0.0
 CAF                     Text.Read.Lex                                        227           6   0.0    0.0     0.0    0.0
 CAF                     GHC.Int                                              222           1   0.0    0.0     0.0    0.0
 CAF                     Data.HashTable                                       213           2   0.0    0.0     0.0    0.0
 CAF                     GHC.Handle                                           211           5   0.0    0.0     0.0    0.0
  main                   Main                                                 279           0   0.0    0.0     0.0    0.0
   readCurve             TsdbFile                                             280           0   0.0    0.0     0.0    0.0
 CAF                     GHC.Conc                                             210           1   0.0    0.0     0.0    0.0
 CAF                     System.Posix.Internals                               192           1   0.0    0.0     0.0    0.0
 CAF                     TsdbFile                                             181           5   0.0    0.0     0.0    0.0
  getCurve               TsdbFile                                             278           1   0.0    0.0     0.0    0.0
 CAF                     Data.Binary.IEEE754                                  180           6   0.0    0.0     0.0    0.0
 CAF                     Data.Binary.Get                                      179           2   0.0    0.0     0.0    0.0
 CAF                     Data.Binary.Put                                      151           1   0.0    0.0     0.0    0.0
 CAF                     GuiLink                                              145           2   0.0    0.0     0.0    0.0
 CAF                     Network                                              144           1   0.0    0.0     0.0    0.0
 CAF                     Network.Socket                                       143           5   0.0    0.0     0.0    0.0
  main                   Main                                                 269           0   0.0    0.0     0.0    0.0
   sendCurve             GuiLink                                              270           0   0.0    0.0     0.0    0.0
 CAF                     Network.BSD                                          139           1   0.0    0.0     0.0    0.0


More information about the Beginners mailing list