Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation

Open in new window