backward selection
Scalable branch-and-bound model selection with non-monotonic criteria including AIC, BIC and Mallows's $\mathit{C_p}$
Vanhoefer, Jakob, Körner, Antonia, Doresic, Domagoj, Hasenauer, Jan, Pathirana, Dilan
Model selection is a pivotal process in the quantitative sciences, where researchers must navigate between numerous candidate models of varying complexity. Traditional information criteria, such as the corrected Akaike Information Criterion (AICc), Bayesian Information Criterion (BIC), and Mallows's $\mathit{C_p}$, are valuable tools for identifying optimal models. However, the exponential increase in candidate models with each additional model parameter renders the evaluation of these criteria for all models -- a strategy known as exhaustive, or brute-force, searches -- computationally prohibitive. Consequently, heuristic approaches like stepwise regression are commonly employed, albeit without guarantees of finding the globally-optimal model. In this study, we challenge the prevailing notion that non-monotonicity in information criteria precludes bounds on the search space. We introduce a simple but novel bound that enables the development of branch-and-bound algorithms tailored for these non-monotonic functions. We demonstrate that our approach guarantees identification of the optimal model(s) across diverse model classes, sizes, and applications, often with orders of magnitude computational speedups. For instance, in one previously-published model selection task involving $2^{32}$ (approximately 4 billion) candidate models, our method achieves a computational speedup exceeding 6,000. These findings have broad implications for the scalability and effectiveness of model selection in complex scientific domains.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
Overinterpretation reveals image classification model pathologies
Overinterpretation is related to overfitting, but overfitting can be diagnosed via reduced test accuracy. Overinterpretation can stem from true statistical signals in the underlying dataset distribution that happen to arise from particular properties of the data source (e.g., dermatologists' rulers).
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Overinterpretation reveals image classification model pathologies
Carter, Brandon, Jain, Siddhartha, Mueller, Jonas, Gifford, David
Image classifiers are typically scored on their test set accuracy, but high accuracy can mask a subtle type of model failure. We find that high scoring convolutional neural networks (CNN) exhibit troubling pathologies that allow them to display high accuracy even in the absence of semantically salient features. When a model provides a high-confidence decision without salient supporting input features we say that the classifier has overinterpreted its input, finding too much class-evidence in patterns that appear nonsensical to humans. Here, we demonstrate that state of the art neural networks for CIFAR-10 and ImageNet suffer from overinterpretation, and find CIFAR-10 trained models make confident predictions even when 95% of an input image has been masked and humans are unable to discern salient features in the remaining pixel subset. Although these patterns portend potential model fragility in real-world deployment, they are in fact valid statistical patterns of the image classification benchmark that alone suffice to attain high test accuracy. We find that ensembling strategies can help mitigate model overinterpretation, and classifiers which rely on more semantically meaningful features can improve accuracy over both the test set and out-of-distribution images from a different source than the training data.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
An Introduction to Variable and Feature Selection
Feature selection is one of the core topics in machine learning. In statistical science, it is called variable reduction or selection. Our scientist published a methodology to automate this process and efficiently handle la large number of features (called variables by statisticians). Here, we mention an article published by Isabelle Guyon and Andre Elisseeff in Journal of Machine Learning Research. While published in 2003, it is still one of the best ML papers on feature selection.
Efficient Stepwise Selection in Decomposable Models
Deshpande, Amol, Garofalakis, Minos, Jordan, Michael I.
In this paper, we present an efficient way of performing stepwise selection in the class of decomposable models. The main contribution of the paper is a simple characterization of the edges that canbe added to a decomposable model while keeping the resulting model decomposable and an efficient algorithm for enumerating all such edges for a given model in essentially O(1) time per edge. We also discuss how backward selection can be performed efficiently using our data structures.We also analyze the complexity of the complete stepwise selection procedure, including the complexity of choosing which of the eligible dges to add to (or delete from) the current model, with the aim ofminimizing the Kullback-Leibler distance of the resulting model from the saturated model for the data.
- North America > United States > California > Alameda County > Berkeley (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)