Goto

Collaborating Authors

 optimal prediction



Optimal Decision-Making Based on Prediction Sets

arXiv.org Machine Learning

Prediction sets can wrap around any ML model to cover unknown test outcomes with a guaranteed probability. Yet, it remains unclear how to use them optimally for downstream decision-making. Here, we propose a decision-theoretic framework that seeks to minimize the expected loss (risk) against a worst-case distribution consistent with the prediction set's coverage guarantee. We first characterize the minimax optimal policy for a fixed prediction set, showing that it balances the worst-case loss inside the set with a penalty for potential losses outside the set. Building on this, we derive the optimal prediction set construction that minimizes the resulting robust risk subject to a coverage constraint. Finally, we introduce Risk-Optimal Conformal Prediction (ROCP), a practical algorithm that targets these risk-minimizing sets while maintaining finite-sample distribution-free marginal coverage. Empirical evaluations on medical diagnosis and safety-critical decision-making tasks demonstrate that ROCP reduces critical mistakes compared to baselines, particularly when out-of-set errors are costly.



A Proof of Theorem

Neural Information Processing Systems

Proposition 2. Using the same notations as in Proposition 1, we have the following results. Algorithm 2 gives pseudocode for finding the optimal split for a given feature. Output: Split (f, t) that gives the largest risk reduction. Proposition 5. F or the sigmoid loss, we have null R Proposition 4. If a node contains the examples Output: Collection of trained decision trees. Algorithm 5: Find_Split(κ, F, T) Input: κ - node; F - number of attributes; T - number of threshold values per attribute.


Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents

arXiv.org Machine Learning

A fundamental question in data-driven decision making is how to quantify the uncertainty of predictions in ways that can usefully inform downstream action. This interface between prediction uncertainty and decision-making is especially important in risk-sensitive domains, such as medicine. In this paper, we develop decision-theoretic foundations that connect uncertainty quantification using prediction sets with risk-averse decision-making. Specifically, we answer three fundamental questions: (1) What is the correct notion of uncertainty quantification for risk-averse decision makers? We prove that prediction sets are optimal for decision makers who wish to optimize their value at risk. (2) What is the optimal policy that a risk averse decision maker should use to map prediction sets to actions? We show that a simple max-min decision policy is optimal for risk-averse decision makers. Finally, (3) How can we derive prediction sets that are optimal for such decision makers? We provide an exact characterization in the population regime and a distribution free finite-sample construction. Answering these questions naturally leads to an algorithm, Risk-Averse Calibration (RAC), which follows a provably optimal design for deriving action policies from predictions. RAC is designed to be both practical-capable of leveraging the quality of predictions in a black-box manner to enhance downstream utility-and safe-adhering to a user-defined risk threshold and optimizing the corresponding risk quantile of the user's downstream utility. Finally, we experimentally demonstrate the significant advantages of RAC in applications such as medical diagnosis and recommendation systems. Specifically, we show that RAC achieves a substantially improved trade-off between safety and utility, offering higher utility compared to existing methods while maintaining the safety guarantee.


Review for NeurIPS paper: Optimal Prediction of the Number of Unseen Species with Multiplicity

Neural Information Processing Systems

Additional Feedback: This paper studies a variant of Fisher et al's unseen species problem, namely, predicting the number of new symbols that appears at least \mu times in the future (unobserved) sample of size a \times n on the basis of the existing sample of size n. This extends the results of Orlitsky et al. [22] focusing on \mu 1, the original setting in Fisher et al. The main findings are - Theorem 1: an estimator is constructed using the smoothing technique from [22] that achieves a normalized prediction error of n {-\Omega(1/a)} provided a O(log n/mu) - Theorem 2: a minimax lower bound n {-O(1/a)} is shown, provided a \Omega(log n/mu). Both the construction and the analysis follow closely those in [22]. Namely, the upper bound is obtained by following the recipe of smoothed estimator (by modifying the unbiased estimator) and the analysis uses Poisson sampling and relies on Bessel function to control the bias from cancellation; the lower bound is obtained by a reduction to the support size estimation problem.


Optimal prediction of Markov chains with and without spectral gap

Neural Information Processing Systems

We study the following learning problem with dependent data: Given a trajectory of length n from a stationary Markov chain with k states, the goal is to predict the distribution of the next state. These nonparametric rates can be attributed to the memory in the data, as the spectral gap of the Markov chain can be arbitrarily small. To quantify the memory effect, we study irreducible reversible chains with a prescribed spectral gap. In addition to characterizing the optimal prediction risk for two states, we show that, as long as the spectral gap is not excessively small, the prediction risk in the Markov model is O(\frac{k 2}{n}), which coincides with that of an iid model with the same number of parameters.


Towards Human-AI Complementarity with Predictions Sets

arXiv.org Artificial Intelligence

In recent years, there has been increasing excitement about the potential of decision support systems based on machine learning to help human experts make more accurate predictions in a variety of application domains, including medicine, education and science [1-3]. In this context, the ultimate goal is human-AI complementarity--the predictions made by the human expert who uses a decision support system are more accurate than the predictions made by the expert on their own and by the classifier used by the decision support system [4-8]. The conventional wisdom is that to achieve human-AI complementarity, decision support systems should help humans understand when and how to use their predictions to update their own. As a result, a flurry of empirical studies has analyzed how factors such as confidence, explanations, or calibration influence when and how humans use the predictions provided by a decision support system [9-12]. Unfortunately, these studies have been so far inconclusive and it is yet unclear how to design decision support systems that achieve human-AI complementarity [13-17]. In this context, Straitouri et al. [18, 19] have recently argued, both theoretically and empirically, that an alternative type of decision support systems may achieve human-AI complementarity, by design. Rather than providing a single label prediction and letting a human expert decide when and how to use the predicted label to update their own prediction, these systems provide a set of label predictions, namely a prediction set, and ask the expert to predict a label value from the set.


Stochastic Online Conformal Prediction with Semi-Bandit Feedback

arXiv.org Artificial Intelligence

Conformal prediction has emerged as an effective strategy for uncertainty quantification by modifying a model to output sets of labels instead of a single label. These prediction sets come with the guarantee that they contain the true label with high probability. However, conformal prediction typically requires a large calibration dataset of i.i.d. examples. We consider the online learning setting, where examples arrive over time, and the goal is to construct prediction sets dynamically. Departing from existing work, we assume semi-bandit feedback, where we only observe the true label if it is contained in the prediction set. For instance, consider calibrating a document retrieval model to a new domain; in this setting, a user would only be able to provide the true label if the target document is in the prediction set of retrieved documents. We propose a novel conformal prediction algorithm targeted at this setting, and prove that it obtains sublinear regret compared to the optimal conformal predictor. We evaluate our algorithm on a retrieval task and an image classification task, and demonstrate that it empirically achieves good performance.


A lattice filter model of the visual pathway

Neural Information Processing Systems

Early stages of visual processing are thought to decorrelate, or whiten, the incoming temporally varying signals. Motivated by the cascade structure of the visual pathway (retina lateral geniculate nucelus (LGN) primary visual cortex, V1) we propose to model its function using lattice filters - signal processing devices for stage-wise decorrelation of temporal signals. Lattice filter models predict neuronal responses consistent with physiological recordings in cats and primates. In particular, they predict temporal receptive fields of two different types resembling so-called lagged and non-lagged cells in the LGN. Moreover, connection weights in the lattice filter can be learned using Hebbian rules in a stage-wise sequential manner reminiscent of the neuro-developmental sequence in mammals.