When is Agnostic Reinforcement Learning Statistically Tractable?