Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning

Lee, Kyungjae, Choi, Sungjoon, Oh, Songhwai

arXiv.org Machine Learning 

Arkov decision processes (MDPs) have been widely used as a mathematical framework to solve stochastic sequential decision problems, such as autonomous driving [1], path planning [2], and quadrotor control [3]. In general, the goal of an MDP is to find the optimal policy function which maximizes the expected return. The expected return is a performance measure of a policy function and it is often defined as the expected sum of discounted rewards. An MDP is often used to formulate reinforcement learning (RL) [4], which aims to find the optimal policy without the explicit specification of stochasticity of an environment, and inverse reinforcement learning (IRL) [5], whose goal is to search the proper reward function that can explain the behavior of an expert who follows the underlying optimal policy. While the optimal solution of an MDP is a deterministic policy, it is not desirable to apply an MDP to the problems with multiple optimal actions. In perspective of RL, the knowledge of multiple optimal actions makes it possible to cope with unexpected situations. For example, suppose that an autonomous vehicle has multiple optimal routes to reach a given goal. If a traffic accident occurs at the currently selected optimal route, it is possible to avoid the accident by choosing another safe optimal route without additional computation of a new optimal route.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found