Motivated by sequential hypothesis tests in identifying the order of autoregressive models, we proposed a new information criterion called Bridge criterion (BC). As we explained in the paper, philosophically it could be applied to a wide variety of statistical inference and machine learning tasks, so that the benefits of both Akaike information criterion (AIC) and Bayesian information criterion (BIC) can be simultaneously obtained.
In this work, we introduce Bridge criterion for variable selection in regression models, and show its optimality in terms of both loss and risk under appropriate assumptions. The key idea is to impose a penalty that is heavy for models with small dimensions and lighter for those with larger dimensions. In contrast to the state-of-art model selection criteria such as the Cp method, cross-validation, AIC, and BIC, the proposed method is able to achieve asymptotic loss and risk efficiency in both parametric and nonparametric regression settings, giving new insights on the reconciliation of two types of classical criteria with different asymptotic behaviors.
Jie Ding, Vahid Tarokh, Yuhong Yang, “Optimal variable selection in regression models”. pdf
One may also ask:
What is the relation of AIC/BIC/BC to other criteria such as minimum description length, Hannan and Quinn criterion, Mallows’s Cp, k-fold (or delete-k) cross-validation, delete-1 cross-validation (or leave-one-out), etc.?
The above paper also reviews their relations. In particular, it clarifies some misunderstanding about folklores such as “70% for training and 30% for test in cross validation” (which is always sub-optimal given sufficiently large data).
Can the criterion be applied to high dimension challenges, where sample size is much smaller than model dimensions?
To answer this question, the above paper also proposes an adapted BC for high dimension, called BC with variable ordering in weight (BC-VO). It turns out to be a promising competitor of state-of-art penalized regression methods such as least absolutely shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD), minimax concave penalty (MCP).
In an earlier work (during my undergraduate), we also studied a step-wise variable selection method for sparse approximation. We found that the support set of the best sparse approximation of the original signal can be sequentially recovered using a greedy algorithm called “Orthogonal Matching Pursuit” under reasonable conditions. We provide those sufficient conditions and showed their tightness. Moreover, we show that the entries of strong-decaying signals can be recovered in the order of their magnitudes under reasonable conditions. “
Jie Ding, Laming Chen, and Yuantao Gu, “Perturbation Analysis of Orthogonal Matching Pursuit”. pdf