Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback

Neural Information Processing Systems 

In this work, we study the low-rank MDPs with adversarially changed losses in the full-information feedback setting. In particular, the unknown transition probability kernel admits a low-rank matrix decomposition [Uehara et al., 2022], and the loss functions may change adversarially but are revealed to the learner at the end of each episode.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found