rashomon
REALITrees: Rashomon Ensemble Active Learning for Interpretable Trees
Nguyen, Simon D., McTavish, Hayden, Hoffman, Kentaro, Rudin, Cynthia, McCormick, Tyler H.
Active learning reduces labeling costs by selecting samples that maximize information gain. A dominant framework, Query-by-Committee (QBC), typically relies on perturbation-based diversity by inducing model disagreement through random feature subsetting or data blinding. While this approximates one notion of epistemic uncertainty, it sacrifices direct characterization of the plausible hypothesis space. We propose the complementary approach: Rashomon Ensembled Active Learning (REAL) which constructs a committee by exhaustively enumerating the Rashomon Set of all near-optimal models. To address functional redundancy within this set, we adopt a PAC-Bayesian framework using a Gibbs posterior to weight committee members by their empirical risk. Leveraging recent algorithmic advances, we exactly enumerate this set for the class of sparse decision trees. Across synthetic and established active learning baselines, REAL outperforms randomized ensembles, particularly in moderately noisy environments where it strategically leverages expanded model multiplicity to achieve faster convergence.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.35)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
- Research Report > New Finding (0.46)
Using Noise to Infer Aspects of Simplicity Without Learning Zachery Boner 1 Harry Chen
Noise in data significantly influences decision-making in the data science process. In fact, it has been shown that noise in data generation processes leads practitioners to find simpler models. However, an open question still remains: what is the degree of model simplification we can expect under different noise levels? In this work, we address this question by investigating the relationship between the amount of noise and model simplicity across various hypothesis spaces, focusing on decision trees and linear models. We formally show that noise acts as an implicit regularizer for several different noise models. Furthermore, we prove that Rashomon sets (sets of near-optimal models) constructed with noisy data tend to contain simpler models than corresponding Rashomon sets with non-noisy data. Additionally, we show that noise expands the set of "good" features and consequently enlarges the set of models that use at least one good feature. Our work offers theoretical guarantees and practical insights for practitioners and policymakers on whether simple-yet-accurate machine learning models are likely to exist, based on knowledge of noise levels in the data generation process.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Wisconsin (0.04)
- North America > United States > Florida > Broward County (0.04)
- North America > Dominican Republic (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Government (1.00)
- Health & Medicine (0.93)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- North America > United States (0.28)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Banking & Finance (1.00)
- Health & Medicine > Therapeutic Area (0.93)
- Law (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- North America > United States (0.04)
- North America > Canada > British Columbia (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- South America > Brazil (0.04)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
- North America > United States > Wisconsin (0.04)
- North America > United States > Florida > Broward County (0.04)
- (3 more...)
- Information Technology > Data Science > Data Mining (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.30)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.30)
- North America > United States > Wisconsin (0.04)
- Europe > Italy (0.04)
- Asia > Middle East > Israel (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Data Science (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)