Improved Algorithms for Misspecified Linear Markov Decision Processes
Vial, Daniel, Parulekar, Advait, Shakkottai, Sanjay, Srikant, R.
Due to the large (possibly infinite) state spaces of modern reinforcement learning applications, practical algorithms must generalize across states. To understand generalization on a theoretical level, recent work has studied linear Markov decision processes (LMDPs), among other models (see Section 1.2 for related work). The LMDP model assumes the next-state distribution and reward are linear in known d-dimensional features, which enables tractable generalization when d is small. Of course, this linear assumption most likely fails in practice, which motivates the misspecified LMDP (MLMDP) model.
Sep-12-2021