Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear q-pi Realizability and Concentrability

Open in new window