Goto

Collaborating Authors

 Wierman, Adam


Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning

arXiv.org Artificial Intelligence

Driven by inherent uncertainty and the sim-to-real gap, robust reinforcement learning (RL) seeks to improve resilience against the complexity and variability in agent-environment sequential interactions. Despite the existence of a large number of RL benchmarks, there is a lack of standardized benchmarks for robust RL. Current robust RL policies often focus on a specific type of uncertainty and are evaluated in distinct, one-off environments. In this work, we introduce Robust-Gymnasium, a unified modular benchmark designed for robust RL that supports a wide variety of disruptions across all key RL components-agents' observed state and reward, agents' actions, and the environment. Offering over sixty diverse task environments spanning control and robotics, safe RL, and multi-agent RL, it provides an open-source and user-friendly tool for the community to assess current methods and foster the development of robust RL algorithms. In addition, we benchmark existing standard and robust RL algorithms within this framework, uncovering significant deficiencies in each and offering new insights.


Towards Environmentally Equitable AI

arXiv.org Artificial Intelligence

Nonetheless, the technological advancement of AI relies on computationally intensive calculations and thus has led to a surge in resource usage and energy consumption. Even putting aside the environmental toll of server manufacturing and supply chains, AI systems can create a huge environmental cost to communities and regions where they are deployed, including air/thermal pollution due to fossil fuel-based electricity generation and further stressed water resources due to AI's staggering water footprint [12, 25]. To make AI more environmentally friendly and ensure that its overall impacts on climate change are positive, recent studies have pursued multi-faceted approaches, including efficient training and inference [5], energy-efficient GPU and accelerator designs [19], carbon forecasting[14], carbon-aware task scheduling[1, 21], green cloud infrastructures[2], sustainable AI policies [10, 18], and more. Additionally, data center operators have also increasingly adopted carbon-free energy(such as solar and wind power) and climate-conscious cooling systems, lowering carbon footprint and direct water consumption [8]. Although these initiatives are encouraging, unfortunately, a worrisome outcome-- environmental inequity -- has emerged [3]. That is, minimizing the total environmental cost of a globally deployed AI system across multiple regions does not necessarily mean that each region is treated equitably. In fact, the environmental cost of AI is often disproportionately higher in certain disadvantaged regions than in others. Even worse, AI's environmental inequity can be amplified by existing environmental equity agnostic resource allocation, load balancing, and scheduling algorithms and compounded by enduring socioeconomic disparities between regions.


Safe Exploitative Play with Untrusted Type Beliefs

arXiv.org Artificial Intelligence

The combination of the Bayesian game and learning has a rich history, with the idea of controlling a single agent in a system composed of multiple agents with unknown behaviors given a set of types, each specifying a possible behavior for the other agents. The idea is to plan an agent's own actions with respect to those types which it believes are most likely to maximize the payoff. However, the type beliefs are often learned from past actions and likely to be incorrect. With this perspective in mind, we consider an agent in a game with type predictions of other components, and investigate the impact of incorrect beliefs to the agent's payoff. In particular, we formally define a tradeoff between risk and opportunity by comparing the payoff obtained against the optimal payoff, which is represented by a gap caused by trusting or distrusting the learned beliefs. Our main results characterize the tradeoff by establishing upper and lower bounds on the Pareto front for both normal-form and stochastic Bayesian games, with numerical results provided.


Communication Efficient Decentralization for Smoothed Online Convex Optimization

arXiv.org Artificial Intelligence

We study the multi-agent Smoothed Online Convex Optimization (SOCO) problem, where $N$ agents interact through a communication graph. In each round, each agent $i$ receives a strongly convex hitting cost function $f^i_t$ in an online fashion and selects an action $x^i_t \in \mathbb{R}^d$. The objective is to minimize the global cumulative cost, which includes the sum of individual hitting costs $f^i_t(x^i_t)$, a temporal "switching cost" for changing decisions, and a spatial "dissimilarity cost" that penalizes deviations in decisions among neighboring agents. We propose the first decentralized algorithm for multi-agent SOCO and prove its asymptotic optimality. Our approach allows each agent to operate using only local information from its immediate neighbors in the graph. For finite-time performance, we establish that the optimality gap in competitive ratio decreases with the time horizon $T$ and can be conveniently tuned based on the per-round computation available to each agent. Moreover, our results hold even when the communication graph changes arbitrarily and adaptively over time. Finally, we establish that the computational complexity per round depends only logarithmically on the number of agents and almost linearly on their degree within the graph, ensuring scalability for large-system implementations.


Online Budgeted Matching with General Bids

arXiv.org Artificial Intelligence

Online Budgeted Matching (OBM) is a classic problem with important applications in online advertising, online service matching, revenue management, and beyond. Traditional online algorithms typically assume a small bid setting, where the maximum bid-to-budget ratio (\kappa) is infinitesimally small. While recent algorithms have tried to address scenarios with non-small or general bids, they often rely on the Fractional Last Matching (FLM) assumption, which allows for accepting partial bids when the remaining budget is insufficient. This assumption, however, does not hold for many applications with indivisible bids. In this paper, we remove the FLM assumption and tackle the open problem of OBM with general bids. We first establish an upper bound of 1-\kappa on the competitive ratio for any deterministic online algorithm. We then propose a novel meta algorithm, called MetaAd, which reduces to different algorithms with first known provable competitive ratios parameterized by the maximum bid-to-budget ratio \kappa \in [0, 1]. As a by-product, we extend MetaAd to the FLM setting and get provable competitive algorithms. Finally, we apply our competitive analysis to the design learning-augmented algorithms.


Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization

arXiv.org Artificial Intelligence

In recent years, reinforcement learning (RL) (Sutton and Barto, 2018) has become a popular framework for solving sequential decision-making problems in unknown environments, with applications across different domains such as robotics (Kober et al., 2013), transportation (Haydari and Yฤฑlmaz, 2020), power systems (Chen et al., 2022), and financial markets (Charpentier et al., 2021). Despite significant progress, the curse of dimensionality remains a major bottleneck in RL tasks (Sutton and Barto, 2018). Specifically, the sample complexity grows geometrically with the dimensionality of the state-action space of the environment, posing challenges for large-scale applications. For example, in robotic control, even adding one more degree of freedom to a single robot can significantly increase the complexity of the control problem (Spong et al., 2020). To overcome the curse of dimensionality in sample complexity, a common approach is incorporating function approximation to approximate either the value function or the policy using a prespecified function class (e.g., neural networks) (Sutton and Barto, 2018). While this approach works in certain applications, these methods heavily rely on the design of the function approximation class, tailored parameter tuning, and other empirical insights. Moreover, they often lack theoretical guarantees. To the best of our knowledge, most existing results are limited to basic settings with linear function approximation (Tsitsiklis and Van Roy, 1996; Bhandari et al., 2018; Srikant and Ying, 2019; Chen et al., 2023).


Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data

arXiv.org Machine Learning

Online Reinforcement learning (RL) typically requires high-stakes online interaction data to learn a policy for a target task. This prompts interest in leveraging historical data to improve sample efficiency. The historical data may come from outdated or related source environments with different dynamics. It remains unclear how to effectively use such data in the target task to provably enhance learning and sample efficiency. To address this, we propose a hybrid transfer RL (HTRL) setting, where an agent learns in a target environment while accessing offline data from a source environment with shifted dynamics. We show that -- without information on the dynamics shift -- general shifted-dynamics data, even with subtle shifts, does not reduce sample complexity in the target environment. However, with prior information on the degree of the dynamics shift, we design HySRL, a transfer algorithm that achieves problem-dependent sample complexity and outperforms pure online RL. Finally, our experimental results demonstrate that HySRL surpasses state-of-the-art online RL baseline.


Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning

arXiv.org Machine Learning

Standard multi-agent reinforcement learning (MARL) algorithms are vulnerable to sim-to-real gaps. To address this, distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL by optimizing the worst-case performance when game dynamics shift within a prescribed uncertainty set. Solving RMGs remains under-explored, from problem formulation to the development of sample-efficient algorithms. A notorious yet open challenge is if RMGs can escape the curse of multiagency, where the sample complexity scales exponentially with the number of agents. In this work, we propose a natural class of RMGs where the uncertainty set of each agent is shaped by both the environment and other agents' strategies in a best-response manner. We first establish the well-posedness of these RMGs by proving the existence of game-theoretic solutions such as robust Nash equilibria and coarse correlated equilibria (CCE). Assuming access to a generative model, we then introduce a sample-efficient algorithm for learning the CCE whose sample complexity scales polynomially with all relevant parameters. To the best of our knowledge, this is the first algorithm to break the curse of multiagency for RMGs.


End-to-End Conformal Calibration for Optimization Under Uncertainty

arXiv.org Artificial Intelligence

Machine learning can significantly improve performance for decision-making under uncertainty in a wide range of domains. However, ensuring robustness guarantees requires well-calibrated uncertainty estimates, which can be difficult to achieve in high-capacity prediction models such as deep neural networks. Moreover, in high-dimensional settings, there may be many valid uncertainty estimates, each with their own performance profile - i.e., not all uncertainty is equally valuable for downstream decision-making. To address this problem, this paper develops an end-to-end framework to learn the uncertainty estimates for conditional robust optimization, with robustness and calibration guarantees provided by conformal prediction. In addition, we propose to represent arbitrary convex uncertainty sets with partially input-convex neural networks, which are learned as part of our framework. Our approach consistently improves upon two-stage estimate-then-optimize baselines on concrete applications in energy storage arbitrage and portfolio optimization.


Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation

arXiv.org Artificial Intelligence

Safe reinforcement learning (RL) is crucial for deploying RL agents in real-world applications, as it aims to maximize long-term rewards while satisfying safety constraints. However, safe RL often suffers from sample inefficiency, requiring extensive interactions with the environment to learn a safe policy. We propose Efficient Safe Policy Optimization (ESPO), a novel approach that enhances the efficiency of safe RL through sample manipulation. ESPO employs an optimization framework with three modes: maximizing rewards, minimizing costs, and balancing the trade-off between the two. By dynamically adjusting the sampling process based on the observed conflict between reward and safety gradients, ESPO theoretically guarantees convergence, optimization stability, and improved sample complexity bounds. Experiments on the Safety-MuJoCo and Omnisafe benchmarks demonstrate that ESPO significantly outperforms existing primal-based and primal-dual-based baselines in terms of reward maximization and constraint satisfaction. Moreover, ESPO achieves substantial gains in sample efficiency, requiring 25--29% fewer samples than baselines, and reduces training time by 21--38%.