AITopics | Shi, Laixi

Collaborating Authors

Shi, Laixi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning

Gu, Shangding, Shi, Laixi, Wen, Muning, Jin, Ming, Mazumdar, Eric, Chi, Yuejie, Wierman, Adam, Spanos, Costas

arXiv.org Artificial IntelligenceFeb-26-2025

Driven by inherent uncertainty and the sim-to-real gap, robust reinforcement learning (RL) seeks to improve resilience against the complexity and variability in agent-environment sequential interactions. Despite the existence of a large number of RL benchmarks, there is a lack of standardized benchmarks for robust RL. Current robust RL policies often focus on a specific type of uncertainty and are evaluated in distinct, one-off environments. In this work, we introduce Robust-Gymnasium, a unified modular benchmark designed for robust RL that supports a wide variety of disruptions across all key RL components-agents' observed state and reward, agents' actions, and the environment. Offering over sixty diverse task environments spanning control and robotics, safe RL, and multi-agent RL, it provides an open-source and user-friendly tool for the community to assess current methods and foster the development of robust RL algorithms. In addition, we benchmark existing standard and robust RL algorithms within this framework, uncovering significant deficiencies in each and offering new insights.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2502.19652

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (0.46)
Media > Television (0.46)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization

Lu, Chenbei, Shi, Laixi, Chen, Zaiwei, Wu, Chenye, Wierman, Adam

arXiv.org Artificial IntelligenceNov-12-2024

In recent years, reinforcement learning (RL) (Sutton and Barto, 2018) has become a popular framework for solving sequential decision-making problems in unknown environments, with applications across different domains such as robotics (Kober et al., 2013), transportation (Haydari and Yılmaz, 2020), power systems (Chen et al., 2022), and financial markets (Charpentier et al., 2021). Despite significant progress, the curse of dimensionality remains a major bottleneck in RL tasks (Sutton and Barto, 2018). Specifically, the sample complexity grows geometrically with the dimensionality of the state-action space of the environment, posing challenges for large-scale applications. For example, in robotic control, even adding one more degree of freedom to a single robot can significantly increase the complexity of the control problem (Spong et al., 2020). To overcome the curse of dimensionality in sample complexity, a common approach is incorporating function approximation to approximate either the value function or the policy using a prespecified function class (e.g., neural networks) (Sutton and Barto, 2018). While this approach works in certain applications, these methods heavily rely on the design of the function approximation class, tailored parameter tuning, and other empirical insights. Moreover, they often lack theoretical guarantees. To the best of our knowledge, most existing results are limited to basic settings with linear function approximation (Tsitsiklis and Van Roy, 1996; Bhandari et al., 2018; Srikant and Ying, 2019; Chen et al., 2023).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2411.07591

Country:

North America > United States (0.28)
Europe > United Kingdom (0.27)

Genre: Research Report (1.00)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable > Wind (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data

Qu, Chengrui, Shi, Laixi, Panaganti, Kishan, You, Pengcheng, Wierman, Adam

arXiv.org Machine LearningNov-6-2024

Online Reinforcement learning (RL) typically requires high-stakes online interaction data to learn a policy for a target task. This prompts interest in leveraging historical data to improve sample efficiency. The historical data may come from outdated or related source environments with different dynamics. It remains unclear how to effectively use such data in the target task to provably enhance learning and sample efficiency. To address this, we propose a hybrid transfer RL (HTRL) setting, where an agent learns in a target environment while accessing offline data from a source environment with shifted dynamics. We show that -- without information on the dynamics shift -- general shifted-dynamics data, even with subtle shifts, does not reduce sample complexity in the target environment. However, with prior information on the degree of the dynamics shift, we design HySRL, a transfer algorithm that achieves problem-dependent sample complexity and outperforms pure online RL. Finally, our experimental results demonstrate that HySRL surpasses state-of-the-art online RL baseline.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2411.0381

Country:

Asia (0.45)
North America > United States > Virginia (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning

Shi, Laixi, Gai, Jingchu, Mazumdar, Eric, Chi, Yuejie, Wierman, Adam

arXiv.org Machine LearningOct-7-2024

Standard multi-agent reinforcement learning (MARL) algorithms are vulnerable to sim-to-real gaps. To address this, distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL by optimizing the worst-case performance when game dynamics shift within a prescribed uncertainty set. Solving RMGs remains under-explored, from problem formulation to the development of sample-efficient algorithms. A notorious yet open challenge is if RMGs can escape the curse of multiagency, where the sample complexity scales exponentially with the number of agents. In this work, we propose a natural class of RMGs where the uncertainty set of each agent is shaped by both the environment and other agents' strategies in a best-response manner. We first establish the well-posedness of these RMGs by proving the existence of game-theoretic solutions such as robust Nash equilibria and coarse correlated equilibria (CCE). Assuming access to a generative model, we then introduce a sample-efficient algorithm for learning the CCE whose sample complexity scales polynomially with all relevant parameters. To the best of our knowledge, this is the first algorithm to break the curse of multiagency for RMGs.

artificial intelligence, multiagency, robust multi-agent reinforcement learning

arXiv.org Machine Learning

2409.20067

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning

Lin, Haohong, Ding, Wenhao, Chen, Jian, Shi, Laixi, Zhu, Jiacheng, Li, Bo, Zhao, Ding

arXiv.org Artificial IntelligenceJul-15-2024

Offline model-based reinforcement learning (MBRL) enhances data efficiency by utilizing pre-collected datasets to learn models and policies, especially in scenarios where exploration is costly or infeasible. Nevertheless, its performance often suffers from the objective mismatch between model and policy learning, resulting in inferior performance despite accurate model predictions. This paper first identifies the primary source of this mismatch comes from the underlying confounders present in offline data for MBRL. Subsequently, we introduce BilinEar CAUSal rEpresentation (BECAUSE), an algorithm to capture causal representation for both states and actions to reduce the influence of the distribution shift, thus mitigating the objective mismatch problem. Comprehensive evaluations on 18 tasks that vary in data quality and environment context demonstrate the superior performance of BECAUSE over existing offline RL algorithms. We show the generalizability and robustness of BECAUSE under fewer samples or larger numbers of confounders. Additionally, we offer theoretical analysis of BECAUSE to prove its error bound and sample efficiency when integrating causal representation into offline MBRL.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2407.10967

Country: North America > United States (0.15)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine (0.67)
Transportation (0.46)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.92)

Add feedback

Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation

Gu, Shangding, Shi, Laixi, Ding, Yuhao, Knoll, Alois, Spanos, Costas, Wierman, Adam, Jin, Ming

arXiv.org Artificial IntelligenceMay-31-2024

Safe reinforcement learning (RL) is crucial for deploying RL agents in real-world applications, as it aims to maximize long-term rewards while satisfying safety constraints. However, safe RL often suffers from sample inefficiency, requiring extensive interactions with the environment to learn a safe policy. We propose Efficient Safe Policy Optimization (ESPO), a novel approach that enhances the efficiency of safe RL through sample manipulation. ESPO employs an optimization framework with three modes: maximizing rewards, minimizing costs, and balancing the trade-off between the two. By dynamically adjusting the sampling process based on the observed conflict between reward and safety gradients, ESPO theoretically guarantees convergence, optimization stability, and improved sample complexity bounds. Experiments on the Safety-MuJoCo and Omnisafe benchmarks demonstrate that ESPO significantly outperforms existing primal-based and primal-dual-based baselines in terms of reward maximization and constraint satisfaction. Moreover, ESPO achieves substantial gains in sample efficiency, requiring 25--29% fewer samples than baselines, and reduces training time by 21--38%.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2405.2086

Country: North America > United States > California (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty

Shi, Laixi, Mazumdar, Eric, Chi, Yuejie, Wierman, Adam

arXiv.org Machine LearningMay-8-2024

To overcome the sim-to-real gap in reinforcement learning (RL), learned policies must maintain robustness against environmental uncertainties. While robust RL has been widely studied in single-agent regimes, in multi-agent environments, the problem remains understudied -- despite the fact that the problems posed by environmental uncertainties are often exacerbated by strategic interactions. This work focuses on learning in distributionally robust Markov games (RMGs), a robust variant of standard Markov games, wherein each agent aims to learn a policy that maximizes its own worst-case performance when the deployed environment deviates within its own prescribed uncertainty set. This results in a set of robust equilibrium strategies for all agents that align with classic notions of game-theoretic equilibria. Assuming a non-adaptive sampling mechanism from a generative model, we propose a sample-efficient model-based algorithm (DRNVI) with finite-sample complexity guarantees for learning robust variants of various notions of game-theoretic equilibria. We also establish an information-theoretic lower bound for solving RMGs, which confirms the near-optimal sample complexity of DRNVI with respect to problem-dependent factors such as the size of the state space, the target accuracy, and the horizon length.

arxiv preprint arxiv, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2404.18909

Country: North America > United States (0.92)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices

Woo, Jiin, Shi, Laixi, Joshi, Gauri, Chi, Yuejie

arXiv.org Artificial IntelligenceFeb-8-2024

Offline RL (Levine et al., 2020), also known as batch RL, addresses the challenge of learning a near-optimal policy using offline datasets collected a priori, without further interactions with an environment. Fueled by the cost-effectiveness of utilizing pre-collected datasets compared to real-time explorations, offline RL has received increasing attention. However, the performance of offline RL crucially depends on the quality of offline datasets due to the lack of additional interactions with the environment, where the quality is determined by how thoroughly the state-action space is explored during data collection. Encouragingly, recent research (Li et al., 2022; Rashidinejad et al., 2021; Shi et al., 2022; Xie et al., 2021b) indicates that being more conservative on unseen state-action pairs, known as the principle of pessimism, enables learning of a near-optimal policy even with partial coverage of the state-action space, as long as the distribution of datasets encompasses the trajectory of the optimal policy. However, acquiring high-quality datasets that have good coverage of the optimal policy poses challenges because it requires the state-action visitation distribution induced by a behavior policy employed for data collection to be very close to the optimal policy. Alternatively, multiple datasets can be merged into one dataset to supplement insufficient coverage of one other, but this may be impractical when offline datasets are scattered and cannot be easily shared due to privacy and communication constraints.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2402.05876

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation

Ding, Wenhao, Shi, Laixi, Chi, Yuejie, Zhao, Ding

arXiv.org Artificial IntelligenceOct-25-2023

Robustness has been extensively studied in reinforcement learning (RL) to handle various forms of uncertainty such as random perturbations, rare events, and malicious attacks. In this work, we consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders. These spurious correlations are ubiquitous in real-world tasks, for instance, a self-driving car usually observes heavy traffic in the daytime and light traffic at night due to unobservable human activity. A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one. Although motivated, enabling robustness against spurious correlation poses significant challenges since the uncertainty set, shaped by the unobserved confounder and causal structure, is difficult to characterize and identify. Existing robust algorithms that assume simple and unstructured uncertainty sets are therefore inadequate to address this challenge. To solve this issue, we propose Robust State-Confounded Markov Decision Processes (RSC-MDPs) and theoretically demonstrate its superiority in avoiding learning spurious correlations compared with other robust RL counterparts. We also design an empirical algorithm to learn the robust optimal policy for RSC-MDPs, which outperforms all baselines in eight realistic self-driving and manipulation tasks.

artificial intelligence, machine learning, robust reinforcement learning, (2 more...)

arXiv.org Artificial Intelligence

2307.07907

Genre: Research Report (0.40)

Industry: Information Technology (0.53)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Offline Reinforcement Learning with On-Policy Q-Function Regularization

Shi, Laixi, Dadashi, Robert, Chi, Yuejie, Castro, Pablo Samuel, Geist, Matthieu

arXiv.org Artificial IntelligenceJul-25-2023

The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work tackles this challenge by implicitly/explicitly regularizing the learning policy towards the behavior policy, which is hard to estimate reliably in practice. In this work, we propose to regularize towards the Q-function of the behavior policy instead of the behavior policy itself, under the premise that the Q-function can be estimated more reliably and easily by a SARSA-style estimate and handles the extrapolation error more straightforwardly. We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2307.13824

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback