Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations