Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs

Open in new window