Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear q-pi Realizability and Concentrability
–Neural Information Processing Systems
The hope in this setting is that learning a good policy will be possible without requiring a sample size that scales with the number of states in the MDP . Foster et al. [ 2021 ] have shown this to be impossible even under concentrability, a data coverage assumption where a coefficient C
Neural Information Processing Systems
Oct-10-2025, 10:37:16 GMT
- Country:
- Europe > United Kingdom
- England
- Cambridgeshire > Cambridge (0.04)
- Greater London > London (0.04)
- England
- North America > Canada
- Europe > United Kingdom
- Genre:
- Research Report > Experimental Study (0.92)
- Technology: