Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Zihan Zhang, Xiangyang Ji

Neural Information Processing Systems 

Therefore, there is a trade-off between exploration and exploitation, i.e., taking actions we have not learned accurately enough and taking actions which

Similar Docs  Excel Report  more

TitleSimilaritySource
None found