Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback

Open in new window