dtrain
0f0c4f3d83c58df58380af3b0729354c-Paper-Conference.pdf
Uncertainty Quantification (UQ) is essential for creating trustworthy machine learning models. Recent years have seen a steep rise in UQ methods that can flag suspicious examples, however, it is often unclear what exactly these methods identify. In this work, we propose a framework for categorizing uncertain examples flagged by UQ methods in classification tasks. We introduce the confusion density matrix--a kernel-based approximation of the misclassification density--and use this to categorize suspicious examples identified by a given uncertainty method into three classes: out-of-distribution (OOD) examples, boundary (Bnd) examples, and examples in regions of high in-distribution misclassification (IDM). Through extensive experiments, we show that our framework provides a new and distinct perspective for assessing differences between uncertainty quantification methods, thereby forming a valuable assessment benchmark.
Conformal Prediction Assessment: A Framework for Conditional Coverage Evaluation and Selection
Zhou, Zheng, Zhang, Xiangfei, Tao, Chongguang, Yang, Yuhong
Conformal prediction provides rigorous distribution-free finite-sample guarantees for marginal coverage under the assumption of exchangeability, but may exhibit systematic undercoverage or overcoverage for specific subpopulations. Assessing conditional validity is challenging, as standard stratification methods suffer from the curse of dimensionality. We propose Conformal Prediction Assessment (CPA), a framework that reframes the evaluation of conditional coverage as a supervised learning task by training a reliability estimator that predicts instance-level coverage probabilities. Building on this estimator, we introduce the Conditional Validity Index (CVI), which decomposes reliability into safety (undercoverage risk) and efficiency (overcoverage cost). We establish convergence rates for the reliability estimator and prove the consistency of CVI-based model selection. Extensive experiments on synthetic and real-world datasets demonstrate that CPA effectively diagnoses local failure modes and that CC-Select, our CVI-based model selection algorithm, consistently identifies predictors with superior conditional coverage performance.
TrueFew-ShotLearningwithLanguageModels
Here, we evaluate the few-shot ability ofLMs when such held-out examples are unavailable, a setting we calltrue few-shot learning. We test two model selection criteria, cross-validation and minimum description length, for choosing LM prompts and hyperparameters in the true few-shot setting. Onaverage, both marginally outperform random selection and greatlyunderperform selection basedonheld-out examples.
31b3b31a1c2f8a370206f111127c0dbd-Supplemental.pdf
Note that we allow multiple estimated quantiles to be identical to eachother,to accommodate the possibility of point masses. Furthermore, we assume หq0(x) and หq1(x) are conservative upper and lower bounds for the support ofY | X = x, i.e., หq0(X) = b0 < Y < bm = หq1(X). We will discuss in the next section practical options for estimating หq(x). Now, we leverage any givenหq(x) to compute estimatesหฯj(x) of the unknown bin probabilities ฯj(x) in (6), for allj {1,...,m}. Although there are multiple way of doing this, a principled solution is to convert the information contained inหq into a piece-wise constant density estimate, and then integrate that density within each bin.
MCU: Improving Machine Unlearning through Mode Connectivity
Machine Unlearning (MU) aims to remove the information of specific training data from a trained model, ensuring compliance with privacy regulations and user requests. While one line of existing MU methods relies on linear parameter updates via task arithmetic, they suffer from weight entanglement. In this work, we propose a novel MU framework called Mode Connectivity Unlearning (MCU) that leverages mode connectivity to find an unlearning pathway in a nonlinear manner. To further enhance performance and efficiency, we introduce a parameter mask strategy that not only improves unlearning effectiveness but also reduces computational overhead. Moreover, we propose an adaptive adjustment strategy for our unlearning penalty coefficient to adaptively balance forgetting quality and predictive performance during training, eliminating the need for empirical hyperparameter tuning. Unlike traditional MU methods that identify only a single unlearning model, MCU uncovers a spectrum of unlearning models along the pathway. Overall, MCU serves as a plug-and-play framework that seamlessly integrates with any existing MU methods, consistently improving unlearning efficacy. Extensive experiments on the image classification task demonstrate that MCU achieves superior performance.
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training
Ge, Albert, Huang, Tzu-Heng, Cooper, John, Trost, Avi, Chu, Ziyi, GNVV, Satya Sai Srinath Namburi, Cai, Ziyang, Park, Kendall, Roberts, Nicholas, Sala, Frederic
Data mixing strategies have successfully reduced the costs involved in training language models. While promising, such methods suffer from two flaws. First, they rely on predetermined data domains (e.g., data sources, task types), which may fail to capture critical semantic nuances, leaving performance on the table. Second, these methods scale with the number of domains in a computationally prohibitive way. We address these challenges via R&B, a framework that re-partitions training data based on semantic similarity (Regroup) to create finer-grained domains, and efficiently optimizes the data composition (Balance) by leveraging a Gram matrix induced by domain gradients obtained throughout training. Unlike prior works, it removes the need for additional compute to obtain evaluation information such as losses or gradients. We analyze this technique under standard regularity conditions and provide theoretical insights that justify R&B's effectiveness compared to non-adaptive mixing approaches. Empirically, we demonstrate the effectiveness of R&B on five diverse datasets ranging from natural language to reasoning and multimodal tasks. With as little as 0.01% additional compute overhead, R&B matches or exceeds the performance of state-of-the-art data mixing strategies.