Review for NeurIPS paper: Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition
–Neural Information Processing Systems
The paper shows a model-free algorithm with an improved regret bound for finite-state finite-horizon MDP problems. The new bound closes the gap with the best model-based result. This is a nice theoretical contribution.
artificial intelligence, machine learning, model-free reinforcement learningvia reference-advantage decomposition, (1 more...)
Neural Information Processing Systems
Jan-27-2025, 14:16:47 GMT
- Technology: