Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation

May-27-2025, 04:17:32 GMT–Neural Information Processing Systems

We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space. Despite its significant benefits, incorporating the non-linear function raises substantial challenges in both *statistical* and *computational* efficiency. The best-known result of Hwang and Oh [2023] has achieved an \widetilde{\mathcal{O}}(\kappa {-1}dH 2\sqrt{K}) regret upper bound, where \kappa is a problem-dependent quantity, d is the feature dimension, H is the episode length, and K is the number of episodes. However, we observe that \kappa {-1} exhibits polynomial dependence on the number of reachable states, which can be as large as the state space size in the worst case and thus undermines the motivation for function approximation. Additionally, their method requires storing all historical data and the time complexity scales linearly with the episode count, which is computationally expensive. In this work, we propose a statistically efficient algorithm that achieves a regret of \widetilde{\mathcal{O}}(dH 2\sqrt{K} \kappa {-1}d 2H 2), eliminating the dependence on \kappa {-1} in the dominant term for the first time.

artificial intelligence, fuzzy logic, provably efficient reinforcement learning, (7 more...)

Neural Information Processing Systems

May-27-2025, 04:17:32 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.88)