Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle

Open in new window