Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
–Neural Information Processing Systems
In this work, we study the low-rank MDPs with adversarially changed losses in the full-information feedback setting. In particular, the unknown transition probability kernel admits a low-rank matrix decomposition [Uehara et al., 2022], and the loss functions may change adversarially but are revealed to the learner at the end of each episode.
Neural Information Processing Systems
May-25-2025, 09:58:00 GMT
- Country:
- Asia > China (0.67)
- Europe (1.00)
- North America
- Canada > British Columbia
- United States (1.00)
- Oceania > Australia
- New South Wales > Sydney (0.14)