Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

Neural Information Processing Systems 

These algorithms combine the ideas of finite-horizon approximation [Chen et al., 2022a], special Bernstein-style bonuses of the MVP algorithm [Zhang et al., 2020], adaptive confidence widening [Wei and Luo, 2021], as

Similar Docs  Excel Report  more

TitleSimilaritySource
None found