Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments
–Neural Information Processing Systems
These algorithms combine the ideas of finite-horizon approximation [Chen et al., 2022a], special Bernstein-style bonuses of the MVP algorithm [Zhang et al., 2020], adaptive confidence widening [Wei and Luo, 2021], as
Neural Information Processing Systems
Aug-19-2025, 09:59:09 GMT
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America
- Canada > Quebec
- Montreal (0.04)
- United States > California (0.14)
- Canada > Quebec
- Europe > United Kingdom
- Technology: