Goto

Collaborating Authors

 Bayesian Inference


Online scalable Gaussian processes with conformal prediction for guaranteed coverage

arXiv.org Machine Learning

The Gaussian process (GP) is a Bayesian nonparametric paradigm that is widely adopted for uncertainty quantification (UQ) in a number of safety-critical applications, including robotics, healthcare, as well as surveillance. The consistency of the resulting uncertainty values however, hinges on the premise that the learning function conforms to the properties specified by the GP model, such as smoothness, periodicity and more, which may not be satisfied in practice, especially with data arriving on the fly. To combat against such model mis-specification, we propose to wed the GP with the prevailing conformal prediction (CP), a distribution-free post-processing framework that produces it prediction sets with a provably valid coverage under the sole assumption of data exchangeability. However, this assumption is usually violated in the online setting, where a prediction set is sought before revealing the true label. To ensure long-term coverage guarantee, we will adaptively set the key threshold parameter based on the feedback whether the true label falls inside the prediction set. Numerical results demonstrate the merits of the online GP-CP approach relative to existing alternatives in the long-term coverage performance.


Are causal effect estimations enough for optimal recommendations under multitreatment scenarios?

arXiv.org Machine Learning

When making treatment selection decisions, it is essential to include a causal effect estimation analysis to compare potential outcomes under different treatments or controls, assisting in optimal selection. However, merely estimating individual treatment effects may not suffice for truly optimal decisions. Our study addressed this issue by incorporating additional criteria, such as the estimations' uncertainty, measured by the conditional value-at-risk, commonly used in portfolio and insurance management. For continuous outcomes observable before and after treatment, we incorporated a specific prediction condition. We prioritized treatments that could yield optimal treatment effect results and lead to post-treatment outcomes more desirable than pretreatment levels, with the latter condition being called the prediction criterion. With these considerations, we propose a comprehensive methodology for multitreatment selection. Our approach ensures satisfaction of the overlap assumption, crucial for comparing outcomes for treated and control groups, by training propensity score models as a preliminary step before employing traditional causal models. To illustrate a practical application of our methodology, we applied it to the credit card limit adjustment problem. Analyzing a fintech company's historical data, we found that relying solely on counterfactual predictions was inadequate for appropriate credit line modifications. Incorporating our proposed additional criteria significantly enhanced policy performance.






Combining Structural and Unstructured Data: A Topic-based Finite Mixture Model for Insurance Claim Prediction

arXiv.org Artificial Intelligence

Modeling insurance claim amounts and classifying claims into different risk levels are critical yet challenging tasks. Traditional predictive models for insurance claims often overlook the valuable information embedded in claim descriptions. This paper introduces a novel approach by developing a joint mixture model that integrates both claim descriptions and claim amounts. Our method establishes a probabilistic link between textual descriptions and loss amounts, enhancing the accuracy of claims clustering and prediction. In our proposed model, the latent topic/component indicator serves as a proxy for both the thematic content of the claim description and the component of loss distributions. Specifically, conditioned on the topic/component indicator, the claim description follows a multinomial distribution, while the claim amount follows a component loss distribution. We propose two methods for model calibration: an EM algorithm for maximum a posteriori estimates, and an MH-within-Gibbs sampler algorithm for the posterior distribution. The empirical study demonstrates that the proposed methods work effectively, providing interpretable claims clustering and prediction.


Ranking Policy Learning via Marketplace Expected Value Estimation From Observational Data

arXiv.org Machine Learning

We develop a decision making framework to cast the problem of learning a ranking policy for search or recommendation engines in a two-sided e-commerce marketplace as an expected reward optimization problem using observational data. As a value allocation mechanism, the ranking policy allocates retrieved items to the designated slots so as to maximize the user utility from the slotted items, at any given stage of the shopping journey. The objective of this allocation can in turn be defined with respect to the underlying probabilistic user browsing model as the expected number of interaction events on presented items matching the user intent, given the ranking context. Through recognizing the effect of ranking as an intervention action to inform users' interactions with slotted items and the corresponding economic value of the interaction events for the marketplace, we formulate the expected reward of the marketplace as the collective value from all presented ranking actions. The key element in this formulation is a notion of context value distribution, which signifies not only the attribution of value to ranking interventions within a session but also the distribution of marketplace reward across user sessions. We build empirical estimates for the expected reward of the marketplace from observational data that account for the heterogeneity of economic value across session contexts as well as the distribution shifts in learning from observational user activity data. The ranking policy can then be trained by optimizing the empirical expected reward estimates via standard Bayesian inference techniques. We report empirical results for a product search ranking task in a major e-commerce platform demonstrating the fundamental trade-offs governed by ranking polices trained on empirical reward estimates with respect to extreme choices of the context value distribution.


Approximate Maximum Likelihood Inference for Acoustic Spatial Capture-Recapture with Unknown Identities, Using Monte Carlo Expectation Maximization

arXiv.org Machine Learning

Acoustic spatial capture-recapture (ASCR) surveys with an array of synchronized acoustic detectors can be an effective way of estimating animal density or call density. However, constructing the capture histories required for ASCR analysis is challenging, as recognizing which detections at different detectors are of which calls is not a trivial task. Because calls from different distances take different times to arrive at detectors, the order in which calls are detected is not necessarily the same as the order in which they are made, and without knowing which detections are of the same call, we do not know how many different calls are detected. We propose a Monte Carlo expectation-maximization (MCEM) estimation method to resolve this unknown call identity problem. To implement the MCEM method in this context, we sample the latent variables from a complete-data likelihood model in the expectation step and use a semi-complete-data likelihood or conditional likelihood in the maximization step. We use a parametric bootstrap to obtain confidence intervals. When we apply our method to a survey of moss frogs, it gives an estimate within 15% of the estimate obtained using data with call capture histories constructed by experts, and unlike this latter estimate, our confidence interval incorporates the uncertainty about call identities. Simulations show it to have a low bias (6%) and coverage probabilities close to the nominal 95% value.


On the Sample Complexity of a Policy Gradient Algorithm with Occupancy Approximation for General Utility Reinforcement Learning

arXiv.org Artificial Intelligence

Reinforcement learning with general utilities has recently gained attention thanks to its ability to unify several problems, including imitation learning, pure exploration, and safe RL. However, prior work for solving this general problem in a unified way has mainly focused on the tabular setting. This is restrictive when considering larger state-action spaces because of the need to estimate occupancy measures during policy optimization. In this work, we address this issue and propose to approximate occupancy measures within a function approximation class using maximum likelihood estimation (MLE). We propose a simple policy gradient algorithm (PG-OMA) where an actor updates the policy parameters to maximize the general utility objective whereas a critic approximates the occupancy measure using MLE. We provide a sample complexity analysis of PG-OMA showing that our occupancy measure estimation error only scales with the dimension of our function approximation class rather than the size of the state action space. Under suitable assumptions, we establish first order stationarity and global optimality performance bounds for the proposed PG-OMA algorithm for nonconcave and concave general utilities respectively. We complement our methodological and theoretical findings with promising empirical results showing the scalability potential of our approach compared to existing tabular count-based approaches.