Provably Efficient $Q$-learning with Function Approximation via Distribution Shift Error Checking Oracle

Open in new window