9b8b50fb590c590ffbf1295ce92258dc-AuthorFeedback.pdf

Neural Information Processing Systems 

For example, when solving RL problems such as Atari7 games, we may test different representation methods. Fortheaveragereward30 setting, it is still an open question whether S-bounds areachievable. Ourapproach canbeadapted totheepisodic31 case when the regret bounds would benefit from the improved bounds available in this setting. The A-dependence is optimal as for UCRL2, while the optimal dependence onS is still an open question (also46 for the MDP case). The optimal dependence on|Φ| in our setting is also open.