Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints

Oct-11-2024, 02:58:52 GMT–Neural Information Processing Systems

We study reinforcement learning (RL) with linear function approximation under the adaptivity constraint. We consider two popular limited adaptivity models: the batch learning model and the rare policy switch model, and propose two efficient online RL algorithms for episodic linear Markov decision processes, where the transition probability and the reward function can be represented as a linear function of some known feature mapping. In specific, for the batch learning model, our proposed LSVI-UCB-Batch algorithm achieves an \tilde O(\sqrt{d 3H 3T} dHT/B) regret, where d is the dimension of the feature mapping, H is the episode length, T is the number of interactions and B is the number of batches. Our algorithms achieve the same regret as the LSVI-UCB algorithm \citep{jin2020provably}, yet with a substantially smaller amount of adaptivity. We also establish a lower bound for the batch learning model, which suggests that the dependency on B in our regret bound is tight.

adaptivity constraint, linear function approximation, provably efficient reinforcement learning, (6 more...)

Neural Information Processing Systems

Oct-11-2024, 02:58:52 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.98)
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.64)