Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation

Open in new window