Performance Improvement Bounds for Lipschitz Configurable Markov Decision Processes

Feb-21-2024–arXiv.org Artificial Intelligence

The framework of the Configurable Markov Decision Processes (Conf-MDPs, Metelli et al., 2018, 2019, 2022) has been introduced in recent years to model a wide range of real-world scenarios in which an agent has the opportunity to alter some environmental parameters in order to improve its learning experience. Conf-MDPs can be thought to as an extension of the traditional Markov Decision Processes (MDP, Puterman, 1994) to account for scenarios that emerge quite often in the Reinforcement Learning (RL, Sutton and Barto, 2018) problems, in which the environment rarely represents an immutable entity and can, indeed, be subject to partial control. In the Conf-MDP framework, the activity of altering the environmental parameters is named environment configuration and serves different purposes. In the simplest scenario, the configuration is carried out by the agent itself that acts as a configurator. This might suggest, at a first sight, that environment configuration can be modeled within the agent actuation.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Feb-21-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Italy (0.14)

Genre:
- Research Report (0.50)

Industry:
- Education (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.91)
  - Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found