decision-making
- North America > Bermuda (0.05)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Banking & Finance (0.93)
- Information Technology (0.67)
- Health & Medicine > Health Care Providers & Services (0.67)
- North America > Canada > Quebec > Montreal (0.04)
- Europe (0.04)
Interview with Anindya Das Antar: Evaluating effectiveness of moderation guardrails in aligning LLM outputs
In their paper presented at AIES 2025, "Do Your Guardrails Even Guard?" Method for Evaluating Effectiveness of Moderation Guardrails in Aligning LLM Outputs with Expert User Expectations, Anindya Das Antar Xun Huan and Nikola Banovic propose a method to evaluate and select guardrails that best align LLM outputs with domain knowledge from subject-matter experts. Here, Anindya tells us more about their method, some case studies, and plans for future developments. Could you give us some background to your work - why are guardrails such an important area for study? Ensuring that large language models (LLMs) produce desirable outputs without harmful side effects and align with user expectations, organizational goals, and existing domain knowledge is crucial for their adoption in high-stakes decision-making. However, despite training on vast amounts of data, LLMs can still produce incorrect, misleading, or otherwise unexpected and undesirable outputs.
- North America > United States > Michigan (0.05)
- Europe (0.05)
- Health & Medicine (0.35)
- Leisure & Entertainment > Sports > Soccer (0.30)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (3 more...)
Large Language Models Play StarCraft II:Benchmarks and A Chain of Summarization Approach
With the continued advancement of Large Language Models (LLMs) Agents in reasoning, planning, and decision-making, benchmarks have become crucial in evaluating these skills. However, there is a notable gap in benchmarks for real-time strategic decision-making. StarCraft II (SC2), with its complex and dynamic nature, serves as an ideal setting for such evaluations. To this end, we have developed TextStarCraft II, a specialized environment for assessing LLMs in real-time strategic scenarios within SC2. Addressing the limitations of traditional Chain of Thought (CoT) methods, we introduce the Chain of Summarization (CoS) method, enhancing LLMs' capabilities in rapid and effective decision-making.
One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning
Offline reinforcement learning (RL) is suitable for safety-critical domains where online exploration is not feasible. In such domains, decision-making should take into consideration the risk of catastrophic outcomes. In other words, decision-making should be . An additional challenge of offline RL is avoiding, i.e. ensuring that state-action pairs visited by the policy remain near those in the dataset. Previous offline RL algorithms that consider risk combine offline RL techniques (to avoid distributional shift), with risk-sensitive RL algorithms (to achieve risk-aversion).
Adversarial Counterfactual Environment Model Learning
An accurate environment dynamics model is crucial for various downstream tasks in sequential decision-making, such as counterfactual prediction, off-policy evaluation, and offline reinforcement learning. Currently, these models were learned through empirical risk minimization (ERM) by step-wise fitting of historical transition data. This way was previously believed unreliable over long-horizon rollouts because of the compounding errors, which can lead to uncontrollable inaccuracies in predictions. In this paper, we find that the challenge extends beyond just long-term prediction errors: we reveal that even when planning with one step, learned dynamics models can also perform poorly due to the selection bias of behavior policies during data collection. This issue will significantly mislead the policy optimization process even in identifying single-step optimal actions, further leading to a greater risk in sequential decision-making scenarios.To tackle this problem, we introduce a novel model-learning objective called adversarial weighted empirical risk minimization (AWRM). AWRM incorporates an adversarial policy that exploits the model to generate a data distribution that weakens the model's prediction accuracy, and subsequently, the model is learned under this adversarial data distribution.We implement a practical algorithm, GALILEO, for AWRM and evaluate it on two synthetic tasks, three continuous-control tasks, and \textit{a real-world application}. The experiments demonstrate that GALILEO can accurately predict counterfactual actions and improve various downstream tasks, including offline policy evaluation and improvement, as well as online decision-making.
AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale Games
Decision-making in large-scale games is an essential research area in artificial intelligence (AI) with significant real-world impact. However, the limited access to realistic large-scale game environments has hindered research progress in this area. In this paper, we present AuctionNet, a benchmark for bid decision-making in large-scale ad auctions derived from a real-world online advertising platform. AuctionNet is composed of three parts: an ad auction environment, a pre-generated dataset based on the environment, and performance evaluations of several baseline bid decision-making algorithms. More specifically, the environment effectively replicates the integrity and complexity of real-world ad auctions through the interaction of several modules: the ad opportunity generation module employs deep generative networks to bridge the gap between simulated and real-world data while mitigating the risk of sensitive data exposure; the bidding module implements diverse auto-bidding agents trained with different decision-making algorithms; and the auction module is anchored in the classic Generalized Second Price (GSP) auction but also allows for customization of auction mechanisms as needed.
Outlier-Robust Wasserstein DRO
Distributionally robust optimization (DRO) is an effective approach for data-driven decision-making in the presence of uncertainty. Geometric uncertainty due to~sampling or localized perturbations of data points is captured by Wasserstein DRO (WDRO), which seeks to learn a model that performs uniformly well over a Wasserstein ball centered around the observed data distribution. However, WDRO fails to account for non-geometric perturbations such as adversarial outliers, which can greatly distort the Wasserstein distance measurement and impede the learned model. We address this gap by proposing a novel outlier-robust WDRO framework for decision-making under both geometric (Wasserstein) perturbations and non-geometric (total variation (TV)) contamination that allows an $\varepsilon$-fraction of data to be arbitrarily corrupted. We design an uncertainty set using a certain robust Wasserstein ball that accounts for both perturbation types and derive minimax optimal excess risk bounds for this procedure that explicitly capture the Wasserstein and TV risks. We prove a strong duality result that enables tractable convex reformulations and efficient computation of our outlier-robust WDRO problem. When the loss function depends only on low-dimensional features of the data, we eliminate certain dimension dependencies from the risk bounds that are unavoidable in the general setting.