Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle

Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang

Neural Information Processing Systems 

The24], which Q-learning exploration Q-function Q-function asymptotically 39] derived drawbackof example, Zou39] require lowerbounded properties.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found