Optimal Data Prediction under Abrupt and Smooth Dynamics

In this work, I aim to circumvent some challenges in online change point analysis, and prediction under time-varying parametric or nonparametric models. Sequential analysis cannot be easily applied to multiple change points that are potentially transient. It motivated my thinking about the possibility of “prediction without (reliable) inference”. An insufficient justification of change detection (i.e. number and locations of changes) may not be a hindrance if an analyst is only interested in predictive performance. After all, a change that tends to be overlooked somehow indicates its insignificance or transience.  

We developed a novel methodology (called kinetic prediction) for predicting time series under unknown abrupt changes and smooth variations in data generating distributions. Based on Kolmogorov’s $\epsilon$ entropy, we proposed a concept called $\epsilon$-predictability that quantifies the size of a model class and the maximal number of structural changes that allows the achievability of optimal prediction. To predict under abrupt changes, the basic idea is to apply $\epsilon$-covering to discretize a nonparametric or parametric model class with $\epsilon$ being at an appropriate vanishing rate with sample size, and then apply a kinetic model averaging over the quantizers. For smooth variations, based on the $\epsilon$-covering, we approximate the evolution path of underlying distributions with piecewise structured variations, and then apply a kinetic model averaging over what we call as “function flows“. Under reasonable assumptions, we prove that the average predictive performance converges almost surely to the oracle, meaning that all the data generating distributions are known in advance. We show that the assumptions hold for a rather wide class of time variations.

The results are to some extent surprising, since the predictions are made sequentially and unknown smooth/abrupt changes of data-generating models can “arbitrarily” occur.
Moreover, for parametric distribution families, we extend the aforementioned kinetic prediction with discretized function spaces to its counterpart with continuous function spaces. The extension naturally leads to a Monte-Carlo sampling-based algorithm for efficient implementation. It can be regarded as the limiting case when $\epsilon$ goes to zero.

Our results also address some philosophical puzzles related to the so called “prediction-inference dilemma” [1]. Various examples and numerical experiments are provided to support the wide applicability of our proposal.

J. Ding, J. Zhou, and V. Tarokh, “Asymptotically Optimal Prediction for Time-Varying Data Generating Processes”. 

[1] Note that the “prediction-inference dilemma” is also ubiquitous in the literature of model selection. It is in terms of the reconciliation of consistent model selection (in well-specified model classes) and optimal rate of convergence (in mis-specified model classes). For a comprehensive literature review and some new advances in that topic, we refer to the page of information criterion, or variable selection

Image souce: https://goo.gl/RcaxfZ