Goto

Collaborating Authors

 Plaut, Benjamin


Asking for Help Enables Safety Guarantees Without Sacrificing Effectiveness

arXiv.org Artificial Intelligence

Most reinforcement learning algorithms with regret guarantees rely on a critical assumption: that all errors are recoverable. Recent work by Plaut et al. discarded this assumption and presented algorithms that avoid "catastrophe" (i.e., irreparable errors) by asking for help. However, they provided only safety guarantees and did not consider reward maximization. We prove that any algorithm that avoids catastrophe in their setting also guarantees high reward (i.e., sublinear regret) in any Markov Decision Process (MDP), including MDPs with irreversible costs. This constitutes the first no-regret guarantee for general MDPs. More broadly, our result may be the first formal proof that it is possible for an agent to obtain high reward while becoming self-sufficient in an unknown, unbounded, and high-stakes environment without causing catastrophe or requiring resets.


Learning to Coordinate with Experts

arXiv.org Machine Learning

When deployed in dynamic environments, AI agents will inevitably encounter challenges that exceed their individual capabilities. Leveraging assistance from expert agents-whether human or AI-can significantly enhance safety and performance in such situations. However, querying experts is often costly, necessitating the development of agents that can efficiently request and utilize expert guidance. In this paper, we introduce a fundamental coordination problem called Learning to Yield and Request Control (YRC), where the objective is to learn a strategy that determines when to act autonomously and when to seek expert assistance. We consider a challenging practical setting in which an agent does not interact with experts during training but must adapt to novel environmental changes and expert interventions at test time. To facilitate empirical research, we introduce YRC-Bench, an open-source benchmark featuring diverse domains. YRC-Bench provides a standardized Gym-like API, simulated experts, evaluation pipeline, and implementation of competitive baselines. Towards tackling the YRC problem, we propose a novel validation approach and investigate the performance of various learning methods across diverse environments, yielding insights that can guide future research.


Getting By Goal Misgeneralization With a Little Help From a Mentor

arXiv.org Artificial Intelligence

While reinforcement learning (RL) agents often perform well during training, they can struggle with distribution shift in real-world deployments. One particularly severe risk of distribution shift is goal misgeneralization, where the agent learns a proxy goal that coincides with the true goal during training but not during deployment. In this paper, we explore whether allowing an agent to ask for help from a supervisor in unfamiliar situations can mitigate this issue. We focus on agents trained with PPO in the CoinRun environment, a setting known to exhibit goal misgeneralization. We evaluate multiple methods for determining when the agent should request help and find that asking for help consistently improves performance. However, we also find that methods based on the agent's internal state fail to proactively request help, instead waiting until mistakes have already occurred. Further investigation suggests that the agent's internal state does not represent the coin at all, highlighting the importance of learning nuanced representations, the risks of ignoring everything not immediately relevant to reward, and the necessity of developing ask-for-help strategies tailored to the agent's training algorithm.


Softmax Probabilities (Mostly) Predict Large Language Model Correctness on Multiple-Choice Q&A

arXiv.org Artificial Intelligence

Although large language models (LLMs) perform impressively on many tasks, overconfidence remains a problem. We hypothesized that on multiple-choice Q&A tasks, wrong answers would be associated with smaller maximum softmax probabilities (MSPs) compared to correct answers. We comprehensively evaluate this hypothesis on ten open-source LLMs and five datasets, and find strong evidence for our hypothesis among models which perform well on the original Q&A task. For the six LLMs with the best Q&A performance, the AUROC derived from the MSP was better than random chance with p < 10^{-4} in 59/60 instances. Among those six LLMs, the average AUROC ranged from 60% to 69%. Leveraging these findings, we propose a multiple-choice Q&A task with an option to abstain and show that performance can be improved by selectively abstaining based on the MSP of the initial model response. We also run the same experiments with pre-softmax logits instead of softmax probabilities and find similar (but not identical) results.


Avoiding Catastrophe in Continuous Spaces by Asking for Help

arXiv.org Artificial Intelligence

Most reinforcement learning algorithms with formal regret guarantees assume all mistakes are reversible and rely on essentially trying all possible options. This approach leads to poor outcomes when some mistakes are irreparable or even catastrophic. We propose a variant of the contextual bandit problem where the goal is to minimize the chance of catastrophe. Specifically, we assume that the payoff each round represents the chance of avoiding catastrophe that round, and try to maximize the product of payoffs (the overall chance of avoiding catastrophe). To give the agent some chance of success, we allow a limited number of queries to a mentor and assume a Lipschitz continuous payoff function. We present an algorithm whose regret and rate of querying the mentor both approach 0 as the time horizon grows, assuming a continuous 1D state space and a relatively "simple" payoff function. We also provide a matching lower bound: without the simplicity assumption: any algorithm either constantly asks for help or is nearly guaranteed to cause catastrophe. Finally, we identify the key obstacle to generalizing our algorithm to a multi-dimensional state space.


Fast Optimal Clearing of Capped-Chain Barter Exchanges

AAAI Conferences

Kidney exchange is a type of barter market where patients exchange willing but incompatible donors. These exchanges are conducted via cycles---where each incompatible patient-donor pair in the cycle both gives and receives a kidney---and chains, which are started by an altruist donor who does not need a kidney in return. Finding the best combination of cycles and chains is hard. The leading algorithms for this optimization problem use either branch and price — a combination of branch and bound and column generation — or constraint generation. We show a correctness error in the leading prior branch-and-price-based approach [Glorie et al. 2014]. We develop a provably correct fix to it, which also necessarily changes the algorithm's complexity, as well as other improvements to the search algorithm. Next, we compare our solver to the leading constraint-generation-based solver and to the best prior correct branch-and-price-based solver. We focus on the setting where chains have a length cap. A cap is desirable in practice since if even one edge in the chain fails, the rest of the chain fails: the cap precludes very long chains that are extremely unlikely to execute and instead causes the solution to have more parallel chains and cycles that are more likely to succeed. We work with the UNOS nationwide kidney exchange, which uses a chain cap. Algorithms from our group autonomously make the transplant plans for that exchange. On that real data and demographically-accurate generated data, our new solver scales significantly better than the prior leading approaches.