AITopics | gym environment

Collaborating Authors

gym environment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SafeRL-Lite: A Lightweight, Explainable, and Constrained Reinforcement Learning Library

Mishra, Satyam, Vi, Phung Thao, Mishra, Shivam, Bijalwan, Vishwanath, Semwal, Vijay Bhaskar, Khan, Abdul Manan

arXiv.org Artificial IntelligenceJun-24-2025

Reinforcement Learning (RL) has achieved remarkable success across a wide range of domains, from game playing to robotic control and autonomous decision-making. However, the deployment of RL agents in real-world safety-critical applications remains a significant challenge due to two key limitations: (1) the lack of safety guarantees during exploration and policy execution, and (2) the opaqueness of learned policies, which hinders human understanding and trust. In practical domains such as autonomous driving, industrial automation, and clinical decision support, agents are often required to operate under hard constraints: for example, to avoid collisions, respect velocity limits, or obey medical safety protocols. Standard RL algorithms, such as Deep Q-Networks (DQN), are typically designed to maximize cumulative reward without any explicit notion of constraint satisfaction. Violations of such constraints can lead to catastrophic outcomes, rendering these agents unusable in safety-sensitive contexts.

machine learning, reinforcement learning, saferl-lite, (16 more...)

arXiv.org Artificial Intelligence

2506.17297

Country:

Europe > United Kingdom > England > Greater London > London (0.14)
Asia > Vietnam > Hanoi > Hanoi (0.05)
Asia > India > Madhya Pradesh > Bhopal (0.04)

Genre: Research Report (0.50)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym

La, Ngoc, Mon-Williams, Ruaridh, Shah, Julie A.

arXiv.org Artificial IntelligenceMay-29-2025

In recent years, reinforcement learning (RL) methods have been widely tested using tools like OpenAI Gym, though many tasks in these environments could also benefit from hierarchical planning. However, there is a lack of a tool that enables seamless integration of hierarchical planning with RL. Hierarchical Domain Definition Language (HDDL), used in classical planning, introduces a structured approach well-suited for model-based RL to address this gap. To bridge this integration, we introduce HDDLGym, a Python-based tool that automatically generates OpenAI Gym environments from HDDL domains and problems. HDDLGym serves as a link between RL and hierarchical planning, supporting multi-agent scenarios and enabling collaborative planning among agents. This paper provides an overview of HDDLGym's design and implementation, highlighting the challenges and design choices involved in integrating HDDL with the Gym interface, and applying RL policies to support hierarchical planning. We also provide detailed instructions and demonstrations for using the HDDLGym framework, including how to work with existing HDDL domains and problems from International Planning Competitions, exemplified by the Transport domain. Additionally, we offer guidance on creating new HDDL domains for multi-agent scenarios and demonstrate the practical use of HDDLGym in the Overcooked domain. By leveraging the advantages of HDDL and Gym, HDDL-Gym aims to be a valuable tool for studying RL in hierarchical planning, particularly in multi-agent contexts.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.22597

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.82)

Add feedback

A Multi-Agent Reinforcement Learning Testbed for Cognitive Radio Applications

Vangaru, Sriniketh, Rosen, Daniel, Green, Dylan, Rodriguez, Raphael, Wiecek, Maxwell, Johnson, Amos, Jones, Alyse M., Headley, William C.

arXiv.org Artificial IntelligenceDec-2-2024

Technological trends show that Radio Frequency Reinforcement Learning (RFRL) will play a prominent role in the wireless communication systems of the future. Applications of RFRL range from military communications jamming to enhancing WiFi networks. Before deploying algorithms for these purposes, they must be trained in a simulation environment to ensure adequate performance. For this reason, we previously created the RFRL Gym: a standardized, accessible tool for the development and testing of reinforcement learning (RL) algorithms in the wireless communications space. This environment leveraged the OpenAI Gym framework and featured customizable simulation scenarios within the RF spectrum. However, the RFRL Gym was limited to training a single RL agent per simulation; this is not ideal, as most real-world RF scenarios will contain multiple intelligent agents in cooperative, competitive, or mixed settings, which is a natural consequence of spectrum congestion. Therefore, through integration with Ray RLlib, multi-agent reinforcement learning (MARL) functionality for training and assessment has been added to the RFRL Gym, making it even more of a robust tool for RF spectrum simulation. This paper provides an overview of the updated RFRL Gym environment. In this work, the general framework of the tool is described relative to comparable existing resources, highlighting the significant additions and refactoring we have applied to the Gym. Afterward, results from testing various RF scenarios in the MARL environment and future additions are discussed.

agent, algorithm, scenario, (15 more...)

arXiv.org Artificial Intelligence

2410.21521

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Virginia (0.04)
Asia > China (0.04)

Genre:

Overview (0.68)
Research Report (0.50)

Industry:

Education (0.66)
Leisure & Entertainment (0.55)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

RAIN: Reinforcement Algorithms for Improving Numerical Weather and Climate Models

Nath, Pritthijit, Moss, Henry, Shuckburgh, Emily, Webb, Mark

arXiv.org Artificial IntelligenceAug-28-2024

This study explores integrating reinforcement learning (RL) with idealised climate models to address key parameterisation challenges in climate science. Current climate models rely on complex mathematical parameterisations to represent sub-grid scale processes, which can introduce substantial uncertainties. RL offers capabilities to enhance these parameterisation schemes, including direct interaction, handling sparse or delayed feedback, continuous online learning, and long-term optimisation. We evaluate the performance of eight RL algorithms on two idealised environments: one for temperature bias correction, another for radiative-convective equilibrium (RCE) imitating real-world computational constraints. Results show different RL approaches excel in different climate scenarios with exploration algorithms performing better in bias correction, while exploitation algorithms proving more effective for RCE. These findings support the potential of RL-based parameterisation schemes to be integrated into global climate models, improving accuracy and efficiency in capturing complex climate dynamics. Overall, this work represents an important first step towards leveraging RL to enhance climate model accuracy, critical for improving climate understanding and predictions. Code accessible at https://github.com/p3jitnath/climate-rl.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2408.16118

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Sequential Modeling of Complex Marine Navigation: Case Study on a Passenger Vessel (Student Abstract)

Fan, Yimeng, Agand, Pedram, Chen, Mo, Park, Edward J., Kennedy, Allison, Bae, Chanwoo

arXiv.org Artificial IntelligenceMar-20-2024

The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static states, actions, and disturbances. This model is designed to predict dynamic states based on the actions provided, subsequently serving as an evaluative tool to assess the proficiency of the ferry's operation under the captain's guidance. Additionally, it lays the foundation for future optimization algorithms, providing valuable feedback on decision-making processes. To facilitate future studies, our code is available at \url{https://github.com/pagand/model_optimze_vessel/tree/AAAI}

dataset, domain knowledge, gym environment, (14 more...)

arXiv.org Artificial Intelligence

2403.13909

Country: North America > Canada > Ontario (0.04)

Genre: Research Report (0.40)

Industry: Transportation > Marine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Testing Spacecraft Formation Flying with Crazyflie Drones as Satellite Surrogates

de la Barcena, Arturo, Rhodes, Collin, McCarroll, John, Cescon, Marzia, Hobbs, Kerianne L.

arXiv.org Artificial IntelligenceFeb-22-2024

As the space domain becomes increasingly congested, autonomy is proposed as one approach to enable small numbers of human ground operators to manage large constellations of satellites and tackle more complex missions such as on-orbit or in-space servicing, assembly, and manufacturing. One of the biggest challenges in developing novel spacecraft autonomy is mechanisms to test and evaluate their performance. Testing spacecraft autonomy on-orbit can be high risk and prohibitively expensive. An alternative method is to test autonomy terrestrially using satellite surrogates such as attitude test beds on air bearings or drones for translational motion visualization. Against this background, this work develops an approach to evaluate autonomous spacecraft behavior using a surrogate platform, namely a micro-quadcopter drone developed by the Bitcraze team, the Crazyflie 2.1. The Crazyflie drones are increasingly becoming ubiquitous in flight testing labs because they are affordable, open source, readily available, and include expansion decks which allow for features such as positioning systems, distance and/or motion sensors, wireless charging, and AI capabilities. In this paper, models of Crazyflie drones are used to simulate the relative motion dynamics of spacecraft under linearized Clohessy-Wiltshire dynamics in elliptical natural motion trajectories, in pre-generated docking trajectories, and via trajectories output by neural network control systems.

spacecraft, trajectory, university, (16 more...)

arXiv.org Artificial Intelligence

2402.1475

Country:

North America > United States > Texas > Harris County > Houston (0.15)
North America > United States > Ohio > Montgomery County > Dayton (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Transportation > Air (1.00)
Government > Military (1.00)
Aerospace & Defense (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Networks (0.87)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.66)

Add feedback

A Novel Variational Lower Bound for Inverse Reinforcement Learning

Gui, Yikang, Doshi, Prashant

arXiv.org Artificial IntelligenceNov-10-2023

Inverse reinforcement learning (IRL) seeks to learn the reward function from expert trajectories, to understand the task for imitation or collaboration thereby removing the need for manual reward engineering. However, IRL in the context of large, highdimensional problems with unknown dynamics has been particularly challenging. In this paper, we present a new Variational Lower Bound for IRL (VLB-IRL), which is derived under the framework of a probabilistic graphical model with an optimality node. Our method simultaneously learns the reward function and policy under the learned reward function by maximizing the lower bound, which is equivalent to minimizing the reverse Kullback-Leibler divergence between an approximated distribution of optimality given the reward function and the true distribution of optimality given trajectories. This leads to a new IRL method that learns a valid reward function such that the policy under the learned reward achieves expert-level performance on several known domains. Importantly, the method outperforms the existing state-of-the-art IRL algorithms on these domains by demonstrating better reward from the learned policy. Reinforcement learning (RL) is a popular method for automating decision making and control. However, to achieve practical effectiveness, significant engineering of reward features and reward functions has traditionally been necessary.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2311.03698

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Add feedback

DiSProD: Differentiable Symbolic Propagation of Distributions for Planning

Chatterjee, Palash, Chapagain, Ashutosh, Chen, Weizhe, Khardon, Roni

arXiv.org Artificial IntelligenceAug-4-2023

The paper introduces DiSProD, an online planner developed for environments with probabilistic transitions in continuous state and action spaces. DiSProD builds a symbolic graph that captures the distribution of future trajectories, conditioned on a given policy, using independence assumptions and approximate propagation of distributions. The symbolic graph provides a differentiable representation of the policy's value, enabling efficient gradient-based optimization for long-horizon search. The propagation of approximate distributions can be seen as an aggregation of many trajectories, making it well-suited for dealing with sparse rewards and stochastic environments. An extensive experimental evaluation compares DiSProD to state-of-the-art planners in discrete-time planning and real-time control of robotic systems. The proposed method improves over existing planners in handling stochastic environments, sensitivity to search depth, sparsity of rewards, and large action spaces. Additional real-world experiments demonstrate that DiSProD can control ground vehicles and surface vessels to successfully navigate around obstacles.

artificial intelligence, disprod, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2302.01491

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Indiana > Monroe County > Bloomington (0.04)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Filters

Collaborating Authors

gym environment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

c0e19ce0dbabbc0d17a4f8d4324cc8e3-Supplemental.pdf

c0e19ce0dbabbc0d17a4f8d4324cc8e3-Supplemental.pdf

SafeRL-Lite: A Lightweight, Explainable, and Constrained Reinforcement Learning Library

HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym

A Multi-Agent Reinforcement Learning Testbed for Cognitive Radio Applications

RAIN: Reinforcement Algorithms for Improving Numerical Weather and Climate Models

Sequential Modeling of Complex Marine Navigation: Case Study on a Passenger Vessel (Student Abstract)

Testing Spacecraft Formation Flying with Crazyflie Drones as Satellite Surrogates

A Novel Variational Lower Bound for Inverse Reinforcement Learning

DiSProD: Differentiable Symbolic Propagation of Distributions for Planning