In the era of “big data”, data analysts have the freedom to apply/propose numerous parametric models to various datasets of interest.
What is the optimal model for any specific dataset?
To select the most appropriate model from a class of more than two candidates, Akaike information criterion (AIC) proposed by Hirotugu Akaike and Bayesian information criterion (BIC) proposed by Gideon E. Schwarz have been “golden rule” for statistical model selection in the past four decades.
When should we use AIC/BIC? What are their (dis)advantages? Is there any chance to have a better information criterion that achieves the benefits of both?
In the work, we revisit the philosophy and applicability of of AIC and BIC. We introduce a new information criterion that attains the benefits of both AIC and BIC when applied to statistical model selection. The new criterion, called Bridge criterion (BC), was designed to “let the data decide” the most appropriate model, which is especially helpful for the (usual) case where a data analyst is not fully confident about the postulated model class.
In history, various model selection techniques flourished from the task of variable selection in regression models and order selection in autoregressive models. Take the autoregressive time series model as an example. BC always achieves optimal prediction under reasonable assumptions. It is different from AIC (resp. BIC) which achieves optimal predictive performance only in mis-specified (resp. well-specified) model classes. We hope that the new ideas will shed light on the reconciliation of two distinct (types of) classical information criteria in general, which has been a notoriously difficult challenge in statistical inference.
For the statistical root that led to our new discovery, and rigorous asymptotic analysis for linear processes, we refer to the following paper (full manuscript available from the authors).
Jie Ding, Vahid Tarokh, Yuhong Yang, “Bridging AIC and BIC: a new criterion for autoregression”. pdf
Analysis of Bridge criterion for regression variable/subset selection is included in the following paper.
Jie Ding, Vahid Tarokh, Yuhong Yang, “Optimal variable selection in regression models”. pdf
Source of the left image: https://goo.gl/TfqXnx