optimality gap
- North America > United States > Texas > Brazos County > College Station (0.14)
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
- Information Technology > Modeling & Simulation (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes
We study sequential decision-making problems in which each agent aims to maximize the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon Constrained Markov Decision Processes (CMDPs) problem. Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method for CMDPs which updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent.
- North America > United States > California (0.14)
- North America > United States > Illinois (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
- Health & Medicine (0.93)
- Government (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)
Learning Optimal Predictive Checklists
Checklists are simple decision aids that are often used to promote safety and reliability in clinical applications. In this paper, we present a method to learn checklists for clinical decision support. We represent predictive checklists as discrete linear classifiers with binary features and unit weights.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Hawaii (0.04)
- (7 more...)
Learning Optimal Predictive Checklists
Checklists are simple decision aids that are often used to promote safety and reliability in clinical applications. In this paper, we present a method to learn checklists for clinical decision support. We represent predictive checklists as discrete linear classifiers with binary features and unit weights. We then learn globally optimal predictive checklists from data by solving an integer programming problem. Our method allows users to customize checklists to obey complex constraints, including constraints to enforce group fairness and to binarize real-valued features at training time. In addition, it pairs models with an optimality gap that can inform model development and determine the feasibility of learning sufficiently accurate checklists on a given dataset. We pair our method with specialized techniques that speed up its ability to train a predictive checklist that performs well and has a small optimality gap. We benchmark the performance of our method on seven clinical classification problems, and demonstrate its practical benefits by training a short-form checklist for PTSD screening. Our results show that our method can fit simple predictive checklists that perform well and that can easily be customized to obey a rich class of custom constraints.
Decoupled Q-Chunking
Li, Qiyang, Park, Seohong, Levine, Sergey
Temporal-difference (TD) methods learn state and action values efficiently by bootstrapping from their own future value predictions, but such a self-bootstrapping mechanism is prone to bootstrapping bias, where the errors in the value targets accumulate across steps and result in biased value estimates. Recent work has proposed to use chunked critics, which estimate the value of short action sequences ("chunks") rather than individual actions, speeding up value backup. However, extracting policies from chunked critics is challenging: policies must output the entire action chunk open-loop, which can be sub-optimal for environments that require policy reactivity and also challenging to model especially when the chunk length grows. Our key insight is to decouple the chunk length of the critic from that of the policy, allowing the policy to operate over shorter action chunks. We propose a novel algorithm that achieves this by optimizing the policy against a distilled critic for partial action chunks, constructed by optimistically backing up from the original chunked critic to approximate the maximum value achievable when a partial action chunk is extended to a complete one. This design retains the benefits of multi-step value propagation while sidestepping both the open-loop sub-optimality and the difficulty of learning action chunking policies for long action chunks. We evaluate our method on challenging, long-horizon offline goal-conditioned tasks and show that it reliably outperforms prior methods. Code: github.com/ColinQiyangLi/dqc.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Asia > South Korea > Daegu > Daegu (0.04)
- Research Report (1.00)
- Workflow (0.88)
Contextual Strongly Convex Simulation Optimization: Optimize then Predict with Inexact Solutions
Lin, Nifei, Luo, Heng, Hong, L. Jeff
In this work, we study contextual strongly convex simulation optimization and adopt an "optimize then predict" (OTP) approach for real-time decision making. In the offline stage, simulation optimization is conducted across a set of covariates to approximate the optimal-solution function; in the online stage, decisions are obtained by evaluating this approximation at the observed covariate. The central theoretical challenge is to understand how the inexactness of solutions generated by simulation-optimization algorithms affects the optimality gap, which is overlooked in existing studies. To address this, we develop a unified analysis framework that explicitly accounts for both solution bias and variance. Using Polyak-Ruppert averaging SGD as an illustrative simulation-optimization algorithm, we analyze the optimality gap of OTP under four representative smoothing techniques: $k$ nearest neighbor, kernel smoothing, linear regression, and kernel ridge regression. We establish convergence rates, derive the optimal allocation of the computational budget $Γ$ between the number of design covariates and the per-covariate simulation effort, and demonstrate the convergence rate can approximately achieve $Γ^{-1}$ under appropriate smoothing technique and sample-allocation rule. Finally, through a numerical study, we validate the theoretical findings and demonstrate the effectiveness and practical value of the proposed approach.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.27)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- (2 more...)
On the Bayes Inconsistency of Disagreement Discrepancy Surrogates
Marchant, Neil G., Cullen, Andrew C., Liu, Feng, Erfani, Sarah M.
Deep neural networks often fail when deployed in real-world contexts due to distribution shift, a critical barrier to building safe and reliable systems. An emerging approach to address this problem relies on \emph{disagreement discrepancy} -- a measure of how the disagreement between two models changes under a shifting distribution. The process of maximizing this measure has seen applications in bounding error under shifts, testing for harmful shifts, and training more robust models. However, this optimization involves the non-differentiable zero-one loss, necessitating the use of practical surrogate losses. We prove that existing surrogates for disagreement discrepancy are not Bayes consistent, revealing a fundamental flaw: maximizing these surrogates can fail to maximize the true disagreement discrepancy. To address this, we introduce new theoretical results providing both upper and lower bounds on the optimality gap for such surrogates. Guided by this theory, we propose a novel disagreement loss that, when paired with cross-entropy, yields a provably consistent surrogate for disagreement discrepancy. Empirical evaluations across diverse benchmarks demonstrate that our method provides more accurate and robust estimates of disagreement discrepancy than existing approaches, particularly under challenging adversarial conditions.
- North America > Canada > Ontario > Toronto (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)
Robust Tabular Foundation Models
Peroni, Matthew, Le, Franck, Sheinin, Vadim
The development of tabular foundation models (TFMs) has accelerated in recent years, showing strong potential to outperform traditional ML methods for structured data. A key finding is that TFMs can be pretrained entirely on synthetic datasets, opening opportunities to design data generators that encourage desirable model properties. Prior work has mainly focused on crafting high-quality priors over generators to improve overall pretraining performance. Our insight is that parameterizing the generator distribution enables an adversarial robustness perspective: during training, we can adapt the generator to emphasize datasets that are particularly challenging for the model. We formalize this by introducing an optimality gap measure, given by the difference between TFM performance and the best achievable performance as estimated by strong baselines such as XGBoost, CatBoost, and Random Forests. Building on this idea, we propose Robust Tabular Foundation Models (RTFM), a model-agnostic adversarial training framework. Applied to the TabPFN V2 classifier, RTFM improves benchmark performance, with up to a 6% increase in mean normalized AUC over the original TabPFN and other baseline algorithms, while requiring less than 100k additional synthetic datasets. These results highlight a promising new direction for targeted adversarial training and fine-tuning of TFMs using synthetic data alone.
- North America > United States > Wisconsin (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)