Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation Long-Fei Li