information criterion
What is in the model? A Comparison of variable selection criteria and model search approaches
Xu, Shuangshuang, Ferreira, Marco A. R., Tegge, Allison N.
What is in the model? Abstract For many scientific questions, understanding the underlying mechanism is the goal. To help investigators better understand the underlying mechanism, variable selection is a crucial step that permits the identification of the most associated regression variables of interest. A variable selection method consists of model evaluation using an information criterion and a search of the model space. Here, we provide a comprehensive comparison of variable selection methods using performance measures of correct identification rate (CIR), recall, and false discovery rate (FDR). We consider the BIC and AIC for evaluating models, and exhaustive, greedy, LASSO path, and stochastic search approaches for searching the model space; we also consider LASSO using cross validation. We perform simulation studies for linear and generalized linear models that parametrically explore a wide range of realistic sample sizes, effect sizes, and correlations among regression variables. We consider model spaces with a small and larger number of potential regressors. The results show that the exhaustive search BIC and stochastic search BIC outperform the other methods when considering the performance measures on small and large model spaces, respectively. These approaches result in the highest CIR and lowest FDR, which collectively may support long-term efforts towards increasing replicability in research.
- North America > United States > Virginia > Roanoke (0.04)
- North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
An Asymptotic Equation Linking WAIC and WBIC in Singular Models
Hayashi, Naoki, Kutsuna, Takuro, Takamuku, Sawa
In statistical learning, models are classified as regular or singular depending on whether the mapping from parameters to probability distributions is injective. Most models with hierarchical structures or latent variables are singular, for which conventional criteria such as the Akaike Information Criterion and the Bayesian Information Criterion are inapplicable due to the breakdown of normal approximations for the likelihood and posterior. To address this, the Widely Applicable Information Criterion (WAIC) and the Widely Applicable Bayesian Information Criterion (WBIC) have been proposed. Since WAIC and WBIC are computed using posterior distributions at different temperature settings, separate posterior sampling is generally required. In this paper, we theoretically derive an asymptotic equation that links WAIC and WBIC, despite their dependence on different posteriors. This equation yields an asymptotically unbiased expression of WAIC in terms of the posterior distribution used for WBIC. The result clarifies the structural relationship between these criteria within the framework of singular learning theory, and deepens understanding of their asymptotic behavior. This theoretical contribution provides a foundation for future developments in the computational efficiency of model selection in singular models.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Estimation of the Learning Coefficient Using Empirical Loss
Takio, Tatsuyoshi, Suzuki, Joe
The learning coefficient plays a crucial role in analyzing the performance of information criteria, such as the Widely Applicable Information Criterion (WAIC) and the Widely Applicable Bayesian Information Criterion (WBIC), which Sumio Watanabe developed to assess model generalization ability. In regular statistical models, the learning coefficient is given by d/2, where d is the dimension of the parameter space. More generally, it is defined as the absolute value of the pole order of a zeta function derived from the Kullback-Leibler divergence and the prior distribution. However, except for specific cases such as reduced-rank regression, the learning coefficient cannot be derived in a closed form. Watanabe proposed a numerical method to estimate the learning coefficient, which Imai further refined to enhance its convergence properties. These methods utilize the asymptotic behavior of WBIC and have been shown to be statistically consistent as the sample size grows. In this paper, we propose a novel numerical estimation method that fundamentally differs from previous approaches and leverages a new quantity, "Empirical Loss," which was introduced by Watanabe. Through numerical experiments, we demonstrate that our proposed method exhibits both lower bias and lower variance compared to those of Watanabe and Imai. Additionally, we provide a theoretical analysis that elucidates why our method outperforms existing techniques and present empirical evidence that supports our findings.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
Singular leaning coefficients and efficiency in learning theory
Singular learning models with non-positive Fisher information matrices include neural networks, reduced-rank regression, Boltzmann machines, normal mixture models, and others. These models have been widely used in the development of learning machines. However, theoretical analysis is still in its early stages. In this paper, we examine learning coefficients, which indicate the general learning efficiency of deep linear learning models and three-layer neural network models with ReLU units. Finally, we extend the results to include the case of the Softmax function.
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
- (2 more...)
Principled model selection for stochastic dynamics
Gerardos, Andonis, Ronceray, Pierre
Complex dynamical systems, from macromolecules to ecosystems, are often modeled by stochastic differential equations. To learn such models from data, a common approach involves sparse selection among a large function library. However, we show that overfitting arises - not just from individual model complexity, but also from the combinatorial growth of possible models. To address this, we introduce Parsimonious Stochastic Inference (PASTIS), a principled method combining likelihood-estimation statistics with extreme value theory to suppress superfluous parameters. PASTIS outperforms existing methods and reliably identifies minimal models, even with low sampling rates or measurement error. It extends to stochastic partial differential equations, and applies to ecological networks and reaction-diffusion dynamics.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- (2 more...)
Alpha-Trimming: Locally Adaptive Tree Pruning for Random Forests
Surjanovic, Nikola, Henrey, Andrew, Loughin, Thomas M.
We demonstrate that adaptively controlling the size of individual regression trees in a random forest can improve predictive performance, contrary to the conventional wisdom that trees should be fully grown. A fast pruning algorithm, alpha-trimming, is proposed as an effective approach to pruning trees within a random forest, where more aggressive pruning is performed in regions with a low signal-to-noise ratio. The amount of overall pruning is controlled by adjusting the weight on an information criterion penalty as a tuning parameter, with the standard random forest being a special case of our alpha-trimmed random forest. A remarkable feature of alpha-trimming is that its tuning parameter can be adjusted without refitting the trees in the random forest once the trees have been fully grown once. In a benchmark suite of 46 example data sets, mean squared prediction error is often substantially lowered by using our pruning algorithm and is never substantially increased compared to a random forest with fully-grown trees at default parameter settings.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Fast leave-one-cluster-out cross-validation by clustered Network Information Criteria (NICc)
Qiu, Jiaxing, Lake, Douglas E., Henry, Teague R.
This paper introduced a clustered estimator of the Network Information Criterion (NICc) to approximate leave-one-cluster-out cross-validated deviance, which can be used as an alternative to cluster-based cross-validation when modeling clustered data. Stone proved that Akaike Information Criterion (AIC) is an asymptotic equivalence to leave-one-observation-out cross-validation if the parametric model is true. Ripley pointed out that the Network Information Criterion (NIC) derived in Stone's proof, is a better approximation to leave-one-observation-out cross-validation when the model is not true. For clustered data, we derived a clustered estimator of NIC, referred to as NICc, by substituting the Fisher information matrix in NIC with its estimator that adjusts for clustering. This adjustment imposes a larger penalty in NICc than the unclustered estimator of NIC when modeling clustered data, thereby preventing overfitting more effectively. In a simulation study and an empirical example, we used linear and logistic regression to model clustered data with Gaussian or binomial response, respectively. We showed that NICc is a better approximation to leave-one-cluster-out deviance and prevents overfitting more effectively than AIC and Bayesian Information Criterion (BIC). NICc leads to more accurate model selection, as determined by cluster-based cross-validation, compared to AIC and BIC.
- North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
On uncertainty-penalized Bayesian information criterion
Thanasutives, Pongpisit, Fukui, Ken-ichi
Graduate School of Information Science and Technology Osaka University Osaka, Japan thanasutives@ai.sanken.osaka-u.ac.jp Ken-ichi Fukui SANKEN (The Institute of Scientific and Industrial Research) Osaka University Osaka, Japan fukui@ai.sanken.osaka-u.ac.jp The uncertainty-penalized information criterion (UBIC) has been proposed as a new model-selection criterion for data-driven partial differential equation (PDE) discovery. In this paper, we show that using the UBIC is equivalent to employing the conventional BIC to a set of overparameterized models derived from the potential regression models of different complexity measures. The result indicates that the asymptotic property of the UBIC and BIC holds indifferently.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.59)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.43)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.43)
Learning under Singularity: An Information Criterion improving WBIC and sBIC
We introduce a novel Information Criterion (IC), termed Learning under Singularity (LS), designed to enhance the functionality of the Widely Applicable Bayes Information Criterion (WBIC) and the Singular Bayesian Information Criterion (sBIC). LS is effective without regularity constraints and demonstrates stability. Watanabe defined a statistical model or a learning machine as regular if the mapping from a parameter to a probability distribution is one-to-one and its Fisher information matrix is positive definite. In contrast, models not meeting these conditions are termed singular. Over the past decade, several information criteria for singular cases have been proposed, including WBIC and sBIC. WBIC is applicable in non-regular scenarios but faces challenges with large sample sizes and redundant estimation of known learning coefficients. Conversely, sBIC is limited in its broader application due to its dependence on maximum likelihood estimates. LS addresses these limitations by enhancing the utility of both WBIC and sBIC. It incorporates the empirical loss from the Widely Applicable Information Criterion (WAIC) to represent the goodness of fit to the statistical model, along with a penalty term similar to that of sBIC. This approach offers a flexible and robust method for model selection, free from regularity constraints.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Czechia (0.04)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.91)
Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include K -fold cross-validation ( K -CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions.