Variance-Adaptive Optimal Algorithm for Reinforcement Learning with Multinomial Logit Function Approximation

Open in new window