Goto

Collaborating Authors

 gym environment




SafeRL-Lite: A Lightweight, Explainable, and Constrained Reinforcement Learning Library

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) has achieved remarkable success across a wide range of domains, from game playing to robotic control and autonomous decision-making. However, the deployment of RL agents in real-world safety-critical applications remains a significant challenge due to two key limitations: (1) the lack of safety guarantees during exploration and policy execution, and (2) the opaqueness of learned policies, which hinders human understanding and trust. In practical domains such as autonomous driving, industrial automation, and clinical decision support, agents are often required to operate under hard constraints: for example, to avoid collisions, respect velocity limits, or obey medical safety protocols. Standard RL algorithms, such as Deep Q-Networks (DQN), are typically designed to maximize cumulative reward without any explicit notion of constraint satisfaction. Violations of such constraints can lead to catastrophic outcomes, rendering these agents unusable in safety-sensitive contexts.


HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym

arXiv.org Artificial Intelligence

In recent years, reinforcement learning (RL) methods have been widely tested using tools like OpenAI Gym, though many tasks in these environments could also benefit from hierarchical planning. However, there is a lack of a tool that enables seamless integration of hierarchical planning with RL. Hierarchical Domain Definition Language (HDDL), used in classical planning, introduces a structured approach well-suited for model-based RL to address this gap. To bridge this integration, we introduce HDDLGym, a Python-based tool that automatically generates OpenAI Gym environments from HDDL domains and problems. HDDLGym serves as a link between RL and hierarchical planning, supporting multi-agent scenarios and enabling collaborative planning among agents. This paper provides an overview of HDDLGym's design and implementation, highlighting the challenges and design choices involved in integrating HDDL with the Gym interface, and applying RL policies to support hierarchical planning. We also provide detailed instructions and demonstrations for using the HDDLGym framework, including how to work with existing HDDL domains and problems from International Planning Competitions, exemplified by the Transport domain. Additionally, we offer guidance on creating new HDDL domains for multi-agent scenarios and demonstrate the practical use of HDDLGym in the Overcooked domain. By leveraging the advantages of HDDL and Gym, HDDL-Gym aims to be a valuable tool for studying RL in hierarchical planning, particularly in multi-agent contexts.


A Multi-Agent Reinforcement Learning Testbed for Cognitive Radio Applications

arXiv.org Artificial Intelligence

Technological trends show that Radio Frequency Reinforcement Learning (RFRL) will play a prominent role in the wireless communication systems of the future. Applications of RFRL range from military communications jamming to enhancing WiFi networks. Before deploying algorithms for these purposes, they must be trained in a simulation environment to ensure adequate performance. For this reason, we previously created the RFRL Gym: a standardized, accessible tool for the development and testing of reinforcement learning (RL) algorithms in the wireless communications space. This environment leveraged the OpenAI Gym framework and featured customizable simulation scenarios within the RF spectrum. However, the RFRL Gym was limited to training a single RL agent per simulation; this is not ideal, as most real-world RF scenarios will contain multiple intelligent agents in cooperative, competitive, or mixed settings, which is a natural consequence of spectrum congestion. Therefore, through integration with Ray RLlib, multi-agent reinforcement learning (MARL) functionality for training and assessment has been added to the RFRL Gym, making it even more of a robust tool for RF spectrum simulation. This paper provides an overview of the updated RFRL Gym environment. In this work, the general framework of the tool is described relative to comparable existing resources, highlighting the significant additions and refactoring we have applied to the Gym. Afterward, results from testing various RF scenarios in the MARL environment and future additions are discussed.


RAIN: Reinforcement Algorithms for Improving Numerical Weather and Climate Models

arXiv.org Artificial Intelligence

This study explores integrating reinforcement learning (RL) with idealised climate models to address key parameterisation challenges in climate science. Current climate models rely on complex mathematical parameterisations to represent sub-grid scale processes, which can introduce substantial uncertainties. RL offers capabilities to enhance these parameterisation schemes, including direct interaction, handling sparse or delayed feedback, continuous online learning, and long-term optimisation. We evaluate the performance of eight RL algorithms on two idealised environments: one for temperature bias correction, another for radiative-convective equilibrium (RCE) imitating real-world computational constraints. Results show different RL approaches excel in different climate scenarios with exploration algorithms performing better in bias correction, while exploitation algorithms proving more effective for RCE. These findings support the potential of RL-based parameterisation schemes to be integrated into global climate models, improving accuracy and efficiency in capturing complex climate dynamics. Overall, this work represents an important first step towards leveraging RL to enhance climate model accuracy, critical for improving climate understanding and predictions. Code accessible at https://github.com/p3jitnath/climate-rl.


Sequential Modeling of Complex Marine Navigation: Case Study on a Passenger Vessel (Student Abstract)

arXiv.org Artificial Intelligence

The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static states, actions, and disturbances. This model is designed to predict dynamic states based on the actions provided, subsequently serving as an evaluative tool to assess the proficiency of the ferry's operation under the captain's guidance. Additionally, it lays the foundation for future optimization algorithms, providing valuable feedback on decision-making processes. To facilitate future studies, our code is available at \url{https://github.com/pagand/model_optimze_vessel/tree/AAAI}


Testing Spacecraft Formation Flying with Crazyflie Drones as Satellite Surrogates

arXiv.org Artificial Intelligence

As the space domain becomes increasingly congested, autonomy is proposed as one approach to enable small numbers of human ground operators to manage large constellations of satellites and tackle more complex missions such as on-orbit or in-space servicing, assembly, and manufacturing. One of the biggest challenges in developing novel spacecraft autonomy is mechanisms to test and evaluate their performance. Testing spacecraft autonomy on-orbit can be high risk and prohibitively expensive. An alternative method is to test autonomy terrestrially using satellite surrogates such as attitude test beds on air bearings or drones for translational motion visualization. Against this background, this work develops an approach to evaluate autonomous spacecraft behavior using a surrogate platform, namely a micro-quadcopter drone developed by the Bitcraze team, the Crazyflie 2.1. The Crazyflie drones are increasingly becoming ubiquitous in flight testing labs because they are affordable, open source, readily available, and include expansion decks which allow for features such as positioning systems, distance and/or motion sensors, wireless charging, and AI capabilities. In this paper, models of Crazyflie drones are used to simulate the relative motion dynamics of spacecraft under linearized Clohessy-Wiltshire dynamics in elliptical natural motion trajectories, in pre-generated docking trajectories, and via trajectories output by neural network control systems.


A Novel Variational Lower Bound for Inverse Reinforcement Learning

arXiv.org Artificial Intelligence

Inverse reinforcement learning (IRL) seeks to learn the reward function from expert trajectories, to understand the task for imitation or collaboration thereby removing the need for manual reward engineering. However, IRL in the context of large, highdimensional problems with unknown dynamics has been particularly challenging. In this paper, we present a new Variational Lower Bound for IRL (VLB-IRL), which is derived under the framework of a probabilistic graphical model with an optimality node. Our method simultaneously learns the reward function and policy under the learned reward function by maximizing the lower bound, which is equivalent to minimizing the reverse Kullback-Leibler divergence between an approximated distribution of optimality given the reward function and the true distribution of optimality given trajectories. This leads to a new IRL method that learns a valid reward function such that the policy under the learned reward achieves expert-level performance on several known domains. Importantly, the method outperforms the existing state-of-the-art IRL algorithms on these domains by demonstrating better reward from the learned policy. Reinforcement learning (RL) is a popular method for automating decision making and control. However, to achieve practical effectiveness, significant engineering of reward features and reward functions has traditionally been necessary.


DiSProD: Differentiable Symbolic Propagation of Distributions for Planning

arXiv.org Artificial Intelligence

The paper introduces DiSProD, an online planner developed for environments with probabilistic transitions in continuous state and action spaces. DiSProD builds a symbolic graph that captures the distribution of future trajectories, conditioned on a given policy, using independence assumptions and approximate propagation of distributions. The symbolic graph provides a differentiable representation of the policy's value, enabling efficient gradient-based optimization for long-horizon search. The propagation of approximate distributions can be seen as an aggregation of many trajectories, making it well-suited for dealing with sparse rewards and stochastic environments. An extensive experimental evaluation compares DiSProD to state-of-the-art planners in discrete-time planning and real-time control of robotic systems. The proposed method improves over existing planners in handling stochastic environments, sensitivity to search depth, sparsity of rewards, and large action spaces. Additional real-world experiments demonstrate that DiSProD can control ground vehicles and surface vessels to successfully navigate around obstacles.