The choice of model class is fundamental in statistical learning and system identification, no matter whether the class is derived from physical principles or is a generic black-box. We develop a method to evaluate the specified model class by assessing its capability of reproducing data that is similar to the observed data record. This model check is based on the information-theoretic properties of models viewed as data generators and is applicable to e.g. sequential data and nonlinear dynamical models. The method can be understood as a specific two-sided posterior predictive test. We apply the information-theoretic model check to both synthetic and real data and compare it with a classical whiteness test.
We describe here a methodology that applies to any statistical test, and illustrated in the context of assessing independence between successive observations in a data set. After reviewing a few standard approaches, we discuss our methodology, its benefits, and drawbacks. The data used here for illustration purposes, has known theoretical auto-correlations. Thus it can be used to benchmark various statistical tests. Our methodology also applies to data with high volatility, in particular, to time series models with undefined autocorrelations.
It is worth noting that the observed data is uniquely orderly according to the time of observation, but it doesn't have to be dependent on time, i.e. time (index of the observations) doesn't have to be one of the independent variables. Stationarity: a stationary process is a stochastic process, whose mean, variance and autocorrelation structure do not change over time. It can also be defined formally using mathematical terms, but in this article, it's not necessary. Intuitively, if a time series is stationary, we look at some parts of them, they should be very similar -- the time series is flat looking and the shape doesn't depend on the shift of time. It surely isn't, since it's not stochastic, stationarity is not one of its properties) Figure 1.1 shows the simplest example of a stationary process -- white noise.
In scientific inference problems, the underlying statistical modeling assumptions have a crucial impact on the end results. There exist, however, only a few automatic means for validating these fundamental modelling assumptions. The contribution in this paper is a general criterion to evaluate the consistency of a set of statistical models with respect to observed data. This is achieved by automatically gauging the models' ability to generate data that is similar to the observed data. Importantly, the criterion follows from the model class itself and is therefore directly applicable to a broad range of inference problems with varying data types. The proposed data consistency criterion is illustrated and evaluated using three synthetic and two real data sets.
When techniques like linear regression or time series were aimed at modelling the general trend exhibited by a set or series of data points, data scientists faced another question - though these models can capture the overall trend but how can one model the volatility in the data? In real life, the initial stages in a business or a new market are always volatile and changing with a high velocity until things calm down and become saturated. It is then one can apply the statistical techniques such as time series analysis or regression as the case may be. To go into the turbulent seas of volatile data and analyze it in a time changing setting, ARCH models were developed. As I already mentioned, ARCH is a statistical model for time series data.