The Value of Reward Lookahead in Reinforcement Learning
–Neural Information Processing Systems
In reinforcement learning (RL), agents sequentially interact with changing environments while aiming to maximize the obtained rewards. Usually, rewards are observed only acting, and so the goal is to maximize the cumulative reward. Yet, in many practical settings, reward information is observed in advance -- prices are observed before performing transactions; nearby traffic information is partially known; and goals are oftentimes given to agents prior to the interaction. In this work, we aim to quantifiably analyze the value of such future reward information through the lens of _competitive analysis.
Neural Information Processing Systems
Mar-21-2026, 17:36:22 GMT
- Technology: