Goto

Collaborating Authors

 dtrain


Conformal Prediction Assessment: A Framework for Conditional Coverage Evaluation and Selection

Zhou, Zheng, Zhang, Xiangfei, Tao, Chongguang, Yang, Yuhong

arXiv.org Machine Learning

Conformal prediction provides rigorous distribution-free finite-sample guarantees for marginal coverage under the assumption of exchangeability, but may exhibit systematic undercoverage or overcoverage for specific subpopulations. Assessing conditional validity is challenging, as standard stratification methods suffer from the curse of dimensionality. We propose Conformal Prediction Assessment (CPA), a framework that reframes the evaluation of conditional coverage as a supervised learning task by training a reliability estimator that predicts instance-level coverage probabilities. Building on this estimator, we introduce the Conditional Validity Index (CVI), which decomposes reliability into safety (undercoverage risk) and efficiency (overcoverage cost). We establish convergence rates for the reliability estimator and prove the consistency of CVI-based model selection. Extensive experiments on synthetic and real-world datasets demonstrate that CPA effectively diagnoses local failure modes and that CC-Select, our CVI-based model selection algorithm, consistently identifies predictors with superior conditional coverage performance.





TrueFew-ShotLearningwithLanguageModels

Neural Information Processing Systems

Here, we evaluate the few-shot ability ofLMs when such held-out examples are unavailable, a setting we calltrue few-shot learning. We test two model selection criteria, cross-validation and minimum description length, for choosing LM prompts and hyperparameters in the true few-shot setting. Onaverage, both marginally outperform random selection and greatlyunderperform selection basedonheld-out examples.



31b3b31a1c2f8a370206f111127c0dbd-Supplemental.pdf

Neural Information Processing Systems

Note that we allow multiple estimated quantiles to be identical to eachother,to accommodate the possibility of point masses. Furthermore, we assume ˆq0(x) and ˆq1(x) are conservative upper and lower bounds for the support ofY | X = x, i.e., ˆq0(X) = b0 < Y < bm = ˆq1(X). We will discuss in the next section practical options for estimating ˆq(x). Now, we leverage any givenˆq(x) to compute estimatesˆπj(x) of the unknown bin probabilities πj(x) in (6), for allj {1,...,m}. Although there are multiple way of doing this, a principled solution is to convert the information contained inˆq into a piece-wise constant density estimate, and then integrate that density within each bin.


MCU: Improving Machine Unlearning through Mode Connectivity

Shi, Yingdan, Wang, Ren

arXiv.org Artificial Intelligence

Machine Unlearning (MU) aims to remove the information of specific training data from a trained model, ensuring compliance with privacy regulations and user requests. While one line of existing MU methods relies on linear parameter updates via task arithmetic, they suffer from weight entanglement. In this work, we propose a novel MU framework called Mode Connectivity Unlearning (MCU) that leverages mode connectivity to find an unlearning pathway in a nonlinear manner. To further enhance performance and efficiency, we introduce a parameter mask strategy that not only improves unlearning effectiveness but also reduces computational overhead. Moreover, we propose an adaptive adjustment strategy for our unlearning penalty coefficient to adaptively balance forgetting quality and predictive performance during training, eliminating the need for empirical hyperparameter tuning. Unlike traditional MU methods that identify only a single unlearning model, MCU uncovers a spectrum of unlearning models along the pathway. Overall, MCU serves as a plug-and-play framework that seamlessly integrates with any existing MU methods, consistently improving unlearning efficacy. Extensive experiments on the image classification task demonstrate that MCU achieves superior performance.


R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

Ge, Albert, Huang, Tzu-Heng, Cooper, John, Trost, Avi, Chu, Ziyi, GNVV, Satya Sai Srinath Namburi, Cai, Ziyang, Park, Kendall, Roberts, Nicholas, Sala, Frederic

arXiv.org Artificial Intelligence

Data mixing strategies have successfully reduced the costs involved in training language models. While promising, such methods suffer from two flaws. First, they rely on predetermined data domains (e.g., data sources, task types), which may fail to capture critical semantic nuances, leaving performance on the table. Second, these methods scale with the number of domains in a computationally prohibitive way. We address these challenges via R&B, a framework that re-partitions training data based on semantic similarity (Regroup) to create finer-grained domains, and efficiently optimizes the data composition (Balance) by leveraging a Gram matrix induced by domain gradients obtained throughout training. Unlike prior works, it removes the need for additional compute to obtain evaluation information such as losses or gradients. We analyze this technique under standard regularity conditions and provide theoretical insights that justify R&B's effectiveness compared to non-adaptive mixing approaches. Empirically, we demonstrate the effectiveness of R&B on five diverse datasets ranging from natural language to reasoning and multimodal tasks. With as little as 0.01% additional compute overhead, R&B matches or exceeds the performance of state-of-the-art data mixing strategies.


Conditional Neural Processes for Molecules

Garcia-Ortegon, Miguel, Bender, Andreas, Bacallado, Sergio

arXiv.org Artificial Intelligence

Neural processes (NPs) are models for transfer learning with properties reminiscent of Gaussian Processes (GPs). They are adept at modelling data consisting of few observations of many related functions on the same input space and are trained by minimizing a variational objective, which is computationally much less expensive than the Bayesian updating required by GPs. So far, most studies of NPs have focused on low-dimensional datasets which are not representative of realistic transfer learning tasks. Drug discovery is one application area that is characterized by datasets consisting of many chemical properties or functions which are sparsely observed, yet depend on shared features or representations of the molecular inputs. This paper applies the conditional neural process (CNP) to DOCKSTRING, a dataset of docking scores for benchmarking ML models. CNPs show competitive performance in few-shot learning tasks relative to supervised learning baselines common in chemoinformatics, as well as an alternative model for transfer learning based on pre-training and refining neural network regressors. We present a Bayesian optimization experiment which showcases the probabilistic nature of CNPs and discuss shortcomings of the model in uncertainty quantification.