Performance Improvement Bounds for Lipschitz Configurable Markov Decision Processes

Metelli, Alberto Maria

arXiv.org Artificial Intelligence 

The framework of the Configurable Markov Decision Processes (Conf-MDPs, Metelli et al., 2018, 2019, 2022) has been introduced in recent years to model a wide range of real-world scenarios in which an agent has the opportunity to alter some environmental parameters in order to improve its learning experience. Conf-MDPs can be thought to as an extension of the traditional Markov Decision Processes (MDP, Puterman, 1994) to account for scenarios that emerge quite often in the Reinforcement Learning (RL, Sutton and Barto, 2018) problems, in which the environment rarely represents an immutable entity and can, indeed, be subject to partial control. In the Conf-MDP framework, the activity of altering the environmental parameters is named environment configuration and serves different purposes. In the simplest scenario, the configuration is carried out by the agent itself that acts as a configurator. This might suggest, at a first sight, that environment configuration can be modeled within the agent actuation.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found