Improved Algorithms for Misspecified Linear Markov Decision Processes

Vial, Daniel, Parulekar, Advait, Shakkottai, Sanjay, Srikant, R.

Sep-12-2021–arXiv.org Machine Learning

Due to the large (possibly infinite) state spaces of modern reinforcement learning applications, practical algorithms must generalize across states. To understand generalization on a theoretical level, recent work has studied linear Markov decision processes (LMDPs), among other models (see Section 1.2 for related work). The LMDP model assumes the next-state distribution and reward are linear in known d-dimensional features, which enables tractable generalization when d is small. Of course, this linear assumption most likely fails in practice, which motivates the misspecified LMDP (MLMDP) model.

algorithm 1, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

Sep-12-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States > Texas (0.14)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.60)