Provably Efficient Offline Reinforcement Learning in Regular Decision Processes

Jan-19-2025, 09:59:28 GMT–Neural Information Processing Systems

RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an \varepsilon -optimal policy, which makes appear a notion of concentrability relevant for RDPs.

algorithm, provably efficient offline reinforcement learning, regular decision process, (5 more...)

Neural Information Processing Systems

Jan-19-2025, 09:59:28 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)