First draft
Here is the first draft of a paper on the modelling of time series extracted from the digital sphere. The basic idea is to try to see if such series have a strong signal to noise ratio, and if they have, what methods can be appropriate to capture the trend signal.
In a nutshell -- yes the data has a strong signal component. It requires robust parametric models. Non parametric methods such as SSA (a very powerful methodology that is becoming increasingly popular) does a great job at extracting rich trend data, but can be thrown out of whack if there are runs of extreme values (such as a hit storm).
Next steps are: (1) moving from predictions to forecasts (predictions are made within the range of actual observations -- we try to fit the data as best as we can; forecasts are made outside the range); (2) designing an approach that will blend parametric models (extremely useful to quickly scan loads of series) and non-parametric (extremely useful to draw rich accounts of trend data).
A revised version of the paper is available here. This paper has been presented at ECIG 2007.
(I would like to extend my gratitude to Stephane Muller, Jerome Coutard and Isabelle Dornic who gave me access to excellent data)
Comments