Goto

Collaborating Authors

 reinforcement learning algorithm



Budgeted Reinforcement Learning in Continuous State Space

Neural Information Processing Systems

So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs.


Evaluating Robustness of Reinforcement Learning Algorithms for Autonomous Shipping

arXiv.org Artificial Intelligence

Recently, there has been growing interest in autonomous shipping due to its potential to improve maritime efficiency and safety. The use of advanced technologies, such as artificial intelligence, can address the current navigational and operational challenges in autonomous shipping. In particular, inland waterway transport (IWT) presents a unique set of challenges, such as crowded waterways and variable environmental conditions. In such dynamic settings, the reliability and robustness of autonomous shipping solutions are critical factors for ensuring safe operations. This paper examines the robustness of benchmark deep reinforcement learning (RL) algorithms, implemented for IWT within an autonomous shipping simulator, and their ability to generate effective motion planning policies. We demonstrate that a model-free approach can achieve an adequate policy in the simulator, successfully navigating port environments never encountered during training. We focus particularly on Soft-Actor Critic (SAC), which we show to be inherently more robust to environmental disturbances compared to MuZero, a state-of-the-art model-based RL algorithm. In this paper, we take a significant step towards developing robust, applied RL frameworks that can be generalized to various vessel types and navigate complex port- and inland environments and scenarios.


Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems

Neural Information Processing Systems

Increasing attention has been paid to reinforcement learning algo(cid:173) rithms in recent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov assumption is removed, however, neither generally the algorithms nor the analyses continue to be usable. We propose and analyze a new learning algorithm to solve a certain class of non-Markov decision problems. Our algorithm applies to problems in which the environment is Markov, but the learner has restricted access to state information. The algorithm involves a Monte-Carlo pol(cid:173) icy evaluation combined with a policy improvement method that is similar to that of Markov decision problems and is guaranteed to converge to a local maximum.


A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory

Neural Information Processing Systems

We describe a Reinforcement Learning algorithm for partially observ(cid:173) able environments using short-term memory, which we call BLHT. Since BLHT learns a stochastic model based on Bayesian Learning, the over(cid:173) fitting problem is reasonably solved. Moreover, BLHT has an efficient implementation. This paper shows that the model learned by BLHT con(cid:173) verges to one which provides the most accurate predictions of percepts and rewards, given short-term memory.


Optimality of Reinforcement Learning Algorithms with Linear Function Approximation

Neural Information Processing Systems

There are several reinforcement learning algorithms that yield ap(cid:173) proximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective function. The results presented here will be useful for comparing the algorithms in terms of the error they achieve relative to the error of the optimal approximate solution.


Understanding Reinforcement Learning Algorithms: The Progress from Basic Q-learning to Proximal Policy Optimization

arXiv.org Artificial Intelligence

This paper presents a review of the field of reinforcement learning (RL), with a focus on providing a comprehensive overview of the key concepts, techniques, and algorithms for beginners. RL has a unique setting, jargon, and mathematics that can be intimidating for those new to the field or artificial intelligence more broadly. While many papers review RL in the context of specific applications, such as games, healthcare, finance, or robotics, these papers can be difficult for beginners to follow due to the inclusion of non-RL-related work and the use of algorithms customized to those specific applications. To address these challenges, this paper provides a clear and concise overview of the fundamental principles of RL and covers the different types of RL algorithms. For each algorithm/method, we outline the main motivation behind its development, its inner workings, and its limitations. The presentation of the paper is aligned with the historical progress of the field, from the early 1980s Q-learning algorithm to the current state-of-the-art algorithms such as TD3, PPO, and offline RL. Overall, this paper aims to serve as a valuable resource for beginners looking to construct a solid understanding of the fundamentals of RL and be aware of the historical progress of the field. It is intended to be a go-to reference for those interested in learning about RL without being distracted by the details of specific applications.


Playing a 2D Game Indefinitely using NEAT and Reinforcement Learning

arXiv.org Artificial Intelligence

For over a decade now, robotics and the use of artificial agents have become a common thing.Testing the performance of new path finding or search space optimization algorithms has also become a challenge as they require simulation or an environment to test them.The creation of artificial environments with artificial agents is one of the methods employed to test such algorithms.Games have also become an environment to test them.The performance of the algorithms can be compared by using artificial agents that will behave according to the algorithm in the environment they are put in.The performance parameters can be, how quickly the agent is able to differentiate between rewarding actions and hostile actions.This can be tested by placing the agent in an environment with different types of hurdles and the goal of the agent is to reach the farthest by taking decisions on actions that will lead to avoiding all the obstacles.The environment chosen is a game called "Flappy Bird".The goal of the game is to make the bird fly through a set of pipes of random heights.The bird must go in between these pipes and must not hit the top, the bottom, or the pipes themselves.The actions that the bird can take are either to flap its wings or drop down with gravity.The algorithms that are enforced on the artificial agents are NeuroEvolution of Augmenting Topologies (NEAT) and Reinforcement Learning.The NEAT algorithm takes an "N" initial population of artificial agents.They follow genetic algorithms by considering an objective function, crossover, mutation, and augmenting topologies.Reinforcement learning, on the other hand, remembers the state, the action taken at that state, and the reward received for the action taken using a single agent and a Deep Q-learning Network.The performance of the NEAT algorithm improves as the initial population of the artificial agents is increased.



QuadSim: A Quadcopter Rotational Dynamics Simulation Framework For Reinforcement Learning Algorithms

arXiv.org Artificial Intelligence

This study focuses on designing and developing a mathematically based quadcopter rotational dynamics simulation framework for testing reinforcement learning (RL) algorithms in many flexible configurations. The design of the simulation framework aims to simulate both linear and nonlinear representations of a quadcopter by solving initial value problems for ordinary differential equation (ODE) systems. In addition, the simulation environment is capable of making the simulation deterministic/stochastic by adding random Gaussian noise in the forms of process and measurement noises. In order to ensure that the scope of this simulation environment is not limited only with our own RL algorithms, the simulation environment has been expanded to be compatible with the OpenAI Gym toolkit. The framework also supports multiprocessing capabilities to run simulation environments simultaneously in parallel. To test these capabilities, many state-of-the-art deep RL algorithms were trained in this simulation framework and the results were compared in detail.