Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear q-pi Realizability and Concentrability

Oct-10-2025, 10:37:16 GMT–Neural Information Processing Systems

The hope in this setting is that learning a good policy will be possible without requiring a sample size that scales with the number of states in the MDP . Foster et al. [ 2021 ] have shown this to be impossible even under concentrability, a data coverage assumption where a coefficient C

min null 1, optimization problem 1, probability, (12 more...)

Neural Information Processing Systems

Oct-10-2025, 10:37:16 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada
  - Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
- Europe > United Kingdom
  - England
    - Cambridgeshire > Cambridge (0.04)
    - Greater London > London (0.04)

Genre:
- Research Report > Experimental Study (0.92)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.45)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.34)

Duplicate Docs Excel Report

Title
Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear q-pi Realizability and Concentrability

Similar Docs Excel Report more

Title	Similarity	Source
None found