Learning in POMDPs is Sample-Efficient with Hindsight Observability

Open in new window