Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation Long-Fei Li

Open in new window