AITopics

2503.10949

Country:

Europe (1.00)
North America > United States > California (0.28)
Asia > Japan > Honshū > Kansai (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

arXiv.org Artificial IntelligenceFeb-26-2025

Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning

Gu, Shangding, Shi, Laixi, Wen, Muning, Jin, Ming, Mazumdar, Eric, Chi, Yuejie, Wierman, Adam, Spanos, Costas

Driven by inherent uncertainty and the sim-to-real gap, robust reinforcement learning (RL) seeks to improve resilience against the complexity and variability in agent-environment sequential interactions. Despite the existence of a large number of RL benchmarks, there is a lack of standardized benchmarks for robust RL. Current robust RL policies often focus on a specific type of uncertainty and are evaluated in distinct, one-off environments. In this work, we introduce Robust-Gymnasium, a unified modular benchmark designed for robust RL that supports a wide variety of disruptions across all key RL components-agents' observed state and reward, agents' actions, and the environment. Offering over sixty diverse task environments spanning control and robotics, safe RL, and multi-agent RL, it provides an open-source and user-friendly tool for the community to assess current methods and foster the development of robust RL algorithms. In addition, we benchmark existing standard and robust RL algorithms within this framework, uncovering significant deficiencies in each and offering new insights.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2502.19652

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (0.46)
Media > Television (0.46)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

arXiv.org Artificial IntelligenceFeb-17-2025

Reward-Safety Balance in Offline Safe RL via Diffusion Regularization

Guo, Junyu, Zheng, Zhi, Ying, Donghao, Jin, Ming, Gu, Shangding, Spanos, Costas, Lavaei, Javad

Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We focus on an offline setting where the agent has only a fixed dataset -- common in realistic tasks to prevent unsafe exploration. To address this, we propose Diffusion-Regularized Constrained Offline Reinforcement Learning (DRCORL), which first uses a diffusion model to capture the behavioral policy from offline data and then extracts a simplified policy to enable efficient inference. We further apply gradient manipulation for safety adaptation, balancing the reward objective and constraint satisfaction. This approach leverages high-quality offline data while incorporating safety requirements. Empirical results show that DRCORL achieves reliable safety performance, fast inference, and strong reward outcomes across robot learning tasks. Compared to existing safe offline RL methods, it consistently meets cost limits and performs well with the same hyperparameters, indicating practical applicability in real-world scenarios.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2502.12391

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceMay-31-2024

Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation

Gu, Shangding, Shi, Laixi, Ding, Yuhao, Knoll, Alois, Spanos, Costas, Wierman, Adam, Jin, Ming

Safe reinforcement learning (RL) is crucial for deploying RL agents in real-world applications, as it aims to maximize long-term rewards while satisfying safety constraints. However, safe RL often suffers from sample inefficiency, requiring extensive interactions with the environment to learn a safe policy. We propose Efficient Safe Policy Optimization (ESPO), a novel approach that enhances the efficiency of safe RL through sample manipulation. ESPO employs an optimization framework with three modes: maximizing rewards, minimizing costs, and balancing the trade-off between the two. By dynamically adjusting the sampling process based on the observed conflict between reward and safety gradients, ESPO theoretically guarantees convergence, optimization stability, and improved sample complexity bounds. Experiments on the Safety-MuJoCo and Omnisafe benchmarks demonstrate that ESPO significantly outperforms existing primal-based and primal-dual-based baselines in terms of reward maximization and constraint satisfaction. Moreover, ESPO achieves substantial gains in sample efficiency, requiring 25--29% fewer samples than baselines, and reduces training time by 21--38%.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2405.2086

Country: North America > United States > California (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceDec-15-2023

Active Reinforcement Learning for Robust Building Control

Jang, Doseok, Yan, Larry, Spangher, Lucas, Spanos, Costas

Reinforcement learning (RL) is a powerful tool for optimal control that has found great success in Atari games, the game of Go, robotic control, and building optimization. RL is also very brittle; agents often overfit to their training environment and fail to generalize to new settings. Unsupervised environment design (UED) has been proposed as a solution to this problem, in which the agent trains in environments that have been specially selected to help it learn. Previous UED algorithms focus on trying to train an RL agent that generalizes across a large distribution of environments. This is not necessarily desirable when we wish to prioritize performance in one environment over others. In this work, we will be examining the setting of robust RL building control, where we wish to train an RL agent that prioritizes performing well in normal weather while still being robust to extreme weather conditions. We demonstrate a novel UED algorithm, ActivePLR, that uses uncertainty-aware neural network architectures to generate new training environments at the limit of the RL agent's ability while being able to prioritize performance in a desired base environment. We show that ActivePLR is able to outperform state-of-the-art UED algorithms in minimizing energy usage while maximizing occupant comfort in the setting of building control.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2312.10289

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry:

Energy (1.00)
Construction & Engineering (1.00)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceAug-14-2021

Offline-Online Reinforcement Learning for Energy Pricing in Office Demand Response: Lowering Energy and Data Costs

Jang, Doseok, Spangher, Lucas, Khattar, Manan, Agwan, Utkarsha, Nadarajah, Selvaprabuh, Spanos, Costas

Our team is proposing to run a full-scale energy demand response experiment in an office building. Although this is an exciting endeavor which will provide value to the community, collecting training data for the reinforcement learning agent is costly and will be limited. In this work, we examine how offline training can be leveraged to minimize data costs (accelerate convergence) and program implementation costs. We present two approaches to doing so: pretraining our model to warm start the experiment with simulated tasks, and using a planning model trained to simulate the real world's rewards to the agent. We present results that demonstrate the utility of offline reinforcement learning to efficient price-setting in the energy demand response problem.

artificial intelligence, controller, reinforcement learning, (20 more...)

2108.06594

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre:

Research Report (0.64)
Instructional Material > Online (0.41)

Industry:

Energy > Power Industry (1.00)
Leisure & Entertainment > Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningFeb-26-2019

Towards Efficient Data Valuation Based on the Shapley Value

Jia, Ruoxi, Dao, David, Wang, Boxin, Hubis, Frances Ann, Hynes, Nick, Gurel, Nezihe Merve, Li, Bo, Zhang, Ce, Song, Dawn, Spanos, Costas

"How much is my data worth?" is an increasingly common question posed by organizations and individuals alike. An answer to this question could allow, for instance, fairly distributing profits among multiple data contributors and determining prospective compensation when data breaches happen. In this paper, we study the problem of data valuation by utilizing the Shapley value, a popular notion of value which originated in coopoerative game theory. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. However, the Shapley value often requires exponential time to compute. To meet this challenge, we propose a repertoire of efficient algorithms for approximating the Shapley value. We also demonstrate the value of each training instance for various benchmark datasets.

data valuation, game theory, health & medicine, (20 more...)

1902.10275

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report (0.65)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

arXiv.org Machine LearningOct-23-2018

One Bit Matters: Understanding Adversarial Examples as the Abuse of Redundancy

Wang, Jingkang, Jia, Ruoxi, Friedland, Gerald, Li, Bo, Spanos, Costas

Despite the great success achieved in machine learning (ML), adversarial examples have caused concerns with regards to its trustworthiness: A small perturbation of an input results in an arbitrary failure of an otherwise seemingly well-trained ML model. While studies are being conducted to discover the intrinsic properties of adversarial examples, such as their transferability and universality, there is insufficient theoretic analysis to help understand the phenomenon in a way that can influence the design process of ML experiments. In this paper, we deduce an information-theoretic model which explains adversarial attacks as the abuse of feature redundancies in ML algorithms. We prove that feature redundancy is a necessary condition for the existence of adversarial examples. Our model helps to explain some major questions raised in many anecdotal studies on adversarial examples. Our theory is backed up by empirical measurements of the information content of benign and adversarial examples on both image and text datasets. Our measurements show that typical adversarial examples introduce just enough redundancy to overflow the decision making of an ML model trained on corresponding benign examples. We conclude with actionable recommendations to improve the robustness of machine learners against adversarial examples.

adversarial example, artificial intelligence, neural network, (17 more...)

1810.0965

Country: North America > United States > California > Alameda County > Berkeley (0.15)

Genre: Research Report (1.00)

Industry:

Government (1.00)
Information Technology > Security & Privacy (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningSep-13-2018

A Deep Learning and Gamification Approach to Energy Conservation at Nanyang Technological University

Konstantakopoulos, Ioannis C., Barkan, Andrew R., He, Shiying, Veeravalli, Tanya, Liu, Huihan, Spanos, Costas

The implementation of smart building technology in the form of smart infrastructure applications has great potential to improve sustainability and energy efficiency by leveraging humans-in-the-loop strategy. However, human preference in regard to living conditions is usually unknown and heterogeneous in its manifestation as control inputs to a building. Furthermore, the occupants of a building typically lack the independent motivation necessary to contribute to and play a key role in the control of smart building infrastructure. Moreover, true human actions and their integration with sensing/actuation platforms remains unknown to the decision maker tasked with improving operational efficiency. By modeling user interaction as a sequential discrete game between non-cooperative players, we introduce a gamification approach for supporting user engagement and integration in a human-centric cyber-physical system. We propose the design and implementation of a large-scale network game with the goal of improving the energy efficiency of a building through the utilization of cutting-edge Internet of Things (IoT) sensors and cyber-physical systems sensing/actuation platforms. A benchmark utility learning framework that employs robust estimations for classical discrete choice models provided for the derived high dimensional imbalanced data. To improve forecasting performance, we extend the benchmark utility learning scheme by leveraging Deep Learning end-to-end training with Deep bi-directional Recurrent Neural Networks. We apply the proposed methods to high dimensional data from a social game experiment designed to encourage energy efficient behavior among smart building occupants in Nanyang Technological University (NTU) residential housing. Using occupant-retrieved actions for resources such as lighting and A/C, we simulate the game defined by the estimated utility functions.

computer game, internet of things, occupant, (22 more...)

1809.05142

Country: North America > United States > California (0.46)

Genre:

Research Report > Experimental Study (0.94)
Research Report > New Finding (0.94)

Industry:

Information Technology > Smart Houses & Appliances (1.00)
Energy > Power Industry (1.00)
Construction & Engineering (1.00)
Leisure & Entertainment > Games > Computer Games (0.70)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningMay-4-2017

Inverse Reinforcement Learning via Deep Gaussian Process

Jin, Ming, Damianou, Andreas, Abbeel, Pieter, Spanos, Costas

We propose a new approach to inverse reinforcement learning (IRL) based on the deep Gaussian process (deep GP) model, which is capable of learning complicated reward structures with few demonstrations. Our model stacks multiple latent GP layers to learn abstract representations of the state feature space, which is linked to the demonstrations through the Maximum Entropy learning framework. Incorporating the IRL engine into the nonlinear latent structure renders existing deep GP inference approaches intractable. To tackle this, we develop a non-standard variational approximation framework which extends previous inference schemes. This allows for approximate Bayesian treatment of the feature space and guards against overfitting. Carrying out representation and inverse reinforcement learning simultaneously within our model outperforms state-of-the-art approaches, as we demonstrate with experiments on standard benchmarks ("object world","highway driving") and a new benchmark ("binary world").

deep learning, demonstration, neural network, (17 more...)

1512.08065

Country: North America > United States > California (0.14)

Genre: Research Report > Promising Solution (0.48)

Industry:

Transportation > Ground > Road (0.48)
Automobiles & Trucks (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)