Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle
Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang
–Neural Information Processing Systems
The24], which Q-learning exploration Q-function Q-function asymptotically 39] derived drawbackof example, Zou39] require lowerbounded properties.
Neural Information Processing Systems
Feb-11-2026, 23:36:43 GMT
- Country:
- Europe > United Kingdom
- England
- Cambridgeshire > Cambridge (0.14)
- Greater London > London (0.04)
- England
- North America
- Europe > United Kingdom
- Technology: