Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear q-pi Realizability and Concentrability

Neural Information Processing Systems 

The hope in this setting is that learning a good policy will be possible without requiring a sample size that scales with the number of states in the MDP . Foster et al. [ 2021 ] have shown this to be impossible even under concentrability, a data coverage assumption where a coefficient C

Similar Docs  Excel Report  more

TitleSimilaritySource
None found