Demystifying Linear MDPs and Novel Dynamics Aggregation Framework

Oct-31-2024–arXiv.org Machine Learning

In this work, we prove that, in linear MDPs, the feature dimension $d$ is lower bounded by $S/U$ in order to aptly represent transition probabilities, where $S$ is the size of the state space and $U$ is the maximum size of directly reachable states. Hence, $d$ can still scale with $S$ depending on the direct reachability of the environment. To address this limitation of linear MDPs, we propose a novel structural aggregation framework based on dynamics, named as the "dynamics aggregation". For this newly proposed framework, we design a provably efficient hierarchical reinforcement learning algorithm in linear function approximation that leverages aggregated sub-structures. Our proposed algorithm exhibits statistical efficiency, achieving a regret of $ \tilde{O} ( d_{\psi}^{3/2} H^{3/2}\sqrt{ N T} )$, where $d_{\psi}$ represents the feature dimension of aggregated subMDPs and $N$ signifies the number of aggregated subMDPs. We establish that the condition $d_{\psi}^3 N \ll d^{3}$ is readily met in most real-world environments with hierarchical structures, enabling a substantial improvement in the regret bound compared to LSVI-UCB, which enjoys a regret of $ \tilde{O} (d^{3/2} H^{3/2} \sqrt{ T})$. To the best of our knowledge, this work presents the first HRL algorithm with linear function approximation that offers provable guarantees.

algorithm, state space, submdp, (15 more...)

arXiv.org Machine Learning

Oct-31-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > San Francisco County > San Francisco (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - South Korea > Seoul
    - Seoul (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report > New Finding (0.92)

Industry:
- Leisure & Entertainment (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found