AITopics | cartpole environment

Collaborating Authors

cartpole environment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reviews: A Composable Specification Language for Reinforcement Learning Tasks

Neural Information Processing SystemsJun-1-2025, 23:48:00 GMT

The specification language seems to be similar to past work, being a restricted form of temporal logic. The atomic predicates comes in two flavours: ("eventually") achieve certain state or ("always") ensuring to avoid certain states. Various composition of these atomic predicates can be used (A then B, A or B, etc.). The paper's proposed finite state machine "task monitor" bears resemblance to the FSM "reward machines" proposed by Icarte et al. [1], which was not cited/discussed. So I will be quite interested how the authours clarify its differences to the Reward Machines.

composable specification language, reinforcement learning task, spectrl, (10 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.43)

Add feedback

NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes

Keplinger, Nathaniel S., Luo, Baiting, Bektas, Iliyas, Zhang, Yunuo, Wray, Kyle Hollins, Laszka, Aron, Dubey, Abhishek, Mukhopadhyay, Ayan

arXiv.org Artificial IntelligenceJan-16-2025

In many real-world applications, agents must make sequential decisions in environments where conditions are subject to change due to various exogenous factors. These non-stationary environments pose significant challenges to traditional decision-making models, which typically assume stationary dynamics. Non-stationary Markov decision processes (NS-MDPs) offer a framework to model and solve decision problems under such changing conditions. However, the lack of standardized benchmarks and simulation tools has hindered systematic evaluation and advance in this field. We present NS-Gym, the first simulation toolkit designed explicitly for NS-MDPs, integrated within the popular Gymnasium framework. In NS-Gym, we segregate the evolution of the environmental parameters that characterize non-stationarity from the agent's decision-making module, allowing for modular and flexible adaptations to dynamic environments. We review prior work in this domain and present a toolkit encapsulating key problem characteristics and types in NS-MDPs. This toolkit is the first effort to develop a set of standardized interfaces and benchmark problems to enable consistent and reproducible evaluation of algorithms under non-stationary conditions. We also benchmark six algorithmic approaches from prior work on NS-MDPs using NS-Gym. Our vision is that NS-Gym will enable researchers to assess the adaptability and robustness of their decision-making algorithms to non-stationary conditions.

agent, experiment, pamct, (15 more...)

arXiv.org Artificial Intelligence

2501.09646

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Workflow (0.93)

Industry: Leisure & Entertainment > Games > Computer Games (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Robustness and Generalization in Quantum Reinforcement Learning via Lipschitz Regularization

Meyer, Nico, Berberich, Julian, Mutschler, Christopher, Scherer, Daniel D.

arXiv.org Artificial IntelligenceOct-28-2024

Quantum machine learning leverages quantum computing to enhance accuracy and reduce model complexity compared to classical approaches, promising significant advancements in various fields. Within this domain, quantum reinforcement learning has garnered attention, often realized using variational quantum circuits to approximate the policy function. This paper addresses the robustness and generalization of quantum reinforcement learning by combining principles from quantum computing and control theory. Leveraging recent results on robust quantum machine learning, we utilize Lipschitz bounds to propose a regularized version of a quantum policy gradient approach, named the RegQPG algorithm. We show that training with RegQPG improves the robustness and generalization of the resulting policies. Furthermore, we introduce an algorithmic variant that incorporates curriculum learning, which minimizes failures during training. Our findings are validated through numerical experiments, demonstrating the practical benefits of our approach.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2410.21117

Country: Europe > Germany (0.29)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Zero-Shot Transfer in Imitation Learning

Cauderan, Alvaro, Boeshertz, Gauthier, Schwarb, Florian, Zhang, Calvin

arXiv.org Artificial IntelligenceOct-10-2023

We present an algorithm that learns to imitate expert behavior and can transfer to previously unseen domains without retraining. Such an algorithm is extremely relevant in real-world applications such as robotic learning because 1) reward functions are difficult to design, 2) learned policies from one domain are difficult to deploy in another domain and 3) learning directly in the real world is either expensive or unfeasible due to security concerns. To overcome these constraints, we combine recent advances in Deep RL by using an AnnealedVAE to learn a disentangled state representation and imitate an expert by learning a single Q-function which avoids adversarial training. We demonstrate the effectiveness of our method in 3 environments ranging in difficulty and the type of transfer knowledge required.

agent, representation, zero-shot transfer, (15 more...)

arXiv.org Artificial Intelligence

2310.0671

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Italy > Sardinia (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Neural Laplace Control for Continuous-time Delayed Systems

Holt, Samuel, Hüyük, Alihan, Qian, Zhaozhi, Sun, Hao, van der Schaar, Mihaela

arXiv.org Artificial IntelligenceApr-10-2023

Many real-world offline reinforcement learning (RL) problems involve continuous-time environments with delays. Such environments are characterized by two distinctive features: firstly, the state x(t) is observed at irregular time intervals, and secondly, the current action a(t) only affects the future state x(t + g) with an unknown delay g > 0. A prime example of such an environment is satellite control where the communication link between earth and a satellite causes irregular observations and delays. Existing offline RL algorithms have achieved success in environments with irregularly observed states in time or known delays. However, environments involving both irregular observations in time and unknown delays remains an open and challenging problem. To this end, we propose Neural Laplace Control, a continuous-time model-based offline RL method that combines a Neural Laplace dynamics model with a model predictive control (MPC) planner--and is able to learn from an offline dataset sampled with irregular time intervals from an environment that has a inherent unknown constant delay. We show experimentally on continuous-time delayed environments it is able to achieve near expert policy performance.

dynamic model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2302.12604

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.63)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deep Q-learning: a robust control approach

Varga, Balazs, Kulcsar, Balazs, Chehreghani, Morteza Haghir

arXiv.org Artificial IntelligenceNov-7-2022

In this paper, we place deep Q-learning into a control-oriented perspective and study its learning dynamics with well-established techniques from robust control. We formulate an uncertain linear time-invariant model by means of the neural tangent kernel to describe learning. We show the instability of learning and analyze the agent's behavior in frequency-domain. Then, we ensure convergence via robust controllers acting as dynamical rewards in the loss function. We synthesize three controllers: state-feedback gain scheduling H2, dynamic Hinf, and constant gain Hinf controllers. Setting up the learning agent with a control-oriented tuning methodology is more transparent and has well-established literature compared to the heuristics in reinforcement learning. In addition, our approach does not use a target network and randomized replay memory. The role of the target network is overtaken by the control input, which also exploits the temporal dependency of samples (opposed to a randomized memory buffer). Numerical simulations in different OpenAI Gym environments suggest that the Hinf controlled learning performs slightly better than Double deep Q-learning.

controller, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1002/rnc.6457

2201.0861

Country:

North America > United States > New York (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
North America > United States > Maryland > Baltimore (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Nested Policy Reinforcement Learning

Mandyam, Aishwarya, Jones, Andrew, Laudanski, Krzysztof, Engelhardt, Barbara

arXiv.org Artificial IntelligenceOct-6-2021

Off-policy reinforcement learning (RL) has proven to be a powerful framework for guiding agents' actions in environments with stochastic rewards and unknown or noisy state dynamics. In many real-world settings, these agents must operate in multiple environments, each with slightly different dynamics. For example, we may be interested in developing policies to guide medical treatment for patients with and without a given disease, or policies to navigate curriculum design for students with and without a learning disability. Here, we introduce nested policy fitted Q-iteration (NFQI), an RL framework that finds optimal policies in environments that exhibit such a structure. Our approach develops a nested $Q$-value function that takes advantage of the shared structure between two groups of observations from two separate environments while allowing their policies to be distinct from one another. We find that NFQI yields policies that rely on relevant features and perform at least as well as a policy that does not consider group structure. We demonstrate NFQI's performance using an OpenAI Gym environment and a clinical decision making RL task. Our results suggest that NFQI can develop policies that are better suited to many real-world clinical environments.

cartpole environment, dataset, nfqi, (16 more...)

arXiv.org Artificial Intelligence

2110.02879

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Materials (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Nephrology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

An Introduction to Reinforcement Learning with OpenAI Gym, RLlib, and Google Colab

#artificialintelligenceSep-2-2021, 07:10:16 GMT

One possible definition of reinforcement learning (RL) is a computational approach to learning how to maximize the total sum of rewards when interacting with an environment. While a definition is useful, this tutorial aims to illustrate what reinforcement learning is through images, code, and video examples and along the way introduce reinforcement learning terms like agents and environments. As a previous post noted, machine learning (ML), a sub-field of AI, uses neural networks or other types of mathematical models to learn how to interpret complex patterns. Two areas of ML that have recently become very popular due to their high level of maturity are supervised learning (SL), in which neural networks learn to make predictions based on large amounts of data, and reinforcement learning (RL), where the networks learn to make good action decisions in a trial-and-error fashion, using a simulator. RL is the tech behind mind-boggling successes such as DeepMind's AlphaGo Zero and the StarCraft II AI (AlphaStar) or OpenAI's DOTA 2 AI ("OpenAI Five").

reinforcement, reinforcement learning, tutorial, (13 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.87)

Industry: Leisure & Entertainment > Games > Computer Games (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.82)

Add feedback

Reinforcement Learning: Deep Q-Learning with Atari games

#artificialintelligenceJul-8-2021, 19:00:15 GMT

In my previous post A First Look at Reinforcement Learning, I attempted to use Deep Q learning to solve the CartPole problem. In this post, I will be further exploring Deep Q learning but in the…

algorithm, atari game, neural network, (15 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Exploring TD error as a heuristic for $\sigma$ selection in Q($\sigma$, $\lambda$)

Nan, Abhishek

arXiv.org Machine LearningDec-21-2019

In the landscape of TD algorithms, the Q( σ,λ) algorithm is an algorithm with the ability to perform a multi-step backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. Selecting the value of σ can be based on characteristics of the current state rather than having a constant value or being time based. This project explores the viability of such a TD-error based scheme. Introduction While having different dimensions of generalizability in an algorithm can serve as a powerful tool, in most cases it comes with the associated burden of having to manually select values along these dimensions, commonly referred to as hyper-parameter selection. In case of learning algorithms, an ideal algorithm would be completely general, even to the point that they do not need a fixed set of hyper-parameters for which they perform optimally for a given problem. In the context of Q( σ,λ), the introduction of the σ parameter gives us flexibility in terms of adjusting the proportion of sampling and expectation we want in our updates. But at the same time, while σ does serve as a hyper-parameter, atypically a constant value of σ was found to not have the best performance by De Asis, Hernandez-Garcia, Holland and Sutton (2018). They used a Dynamic Decay σ scheme for n-step Q( σ) where they reduced the value of σ after every episode by a factor of 0.95.

algorithm, experiment, td error, (14 more...)

arXiv.org Machine Learning

1912.10316

Country: North America > Canada > Alberta (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback