Reinforcement Learning with Imperfect Transition Predictions: ABellman-Jensen Approach
–Neural Information Processing Systems
Traditional reinforcement learning (RL) assumes the agents make decisions based on Markov decision processes (MDPs) with one-step transition models. In many real-world applications, such as energy management and stock investment, agents can access multi-step predictions of future states, which provide additional advantages for decision making. However, multi-step predictions are inherently high-dimensional: naively embedding these predictions into an MDP leads to an exponential blow-up in state space and the curse of dimensionality. Moreover, existing RL theory provides few tools to analyze prediction-augmented MDPs, as it typically works on one-step transition kernels and cannot accommodate multi-step predictions with errors or partial action-coverage. We address these challenges with three key innovations: First, we propose the Bayesian value function to characterize the optimal prediction-aware policy tractably. Second, we develop a novel BellmanJensen Gap analysis on the Bayesian value function, which enables characterizing the value of imperfect predictions. Third, we introduce BOLA (Bayesian Offline Learning with Online Adaptation), a two-stage model-based RL algorithm that separates offline Bayesian value learning from lightweight online adaptation to real-time predictions. We prove that BOLA remains sample-efficient even under imperfect predictions.
Neural Information Processing Systems
Jun-19-2026, 21:53:05 GMT
- Country:
- North America > United States (0.45)
- Asia > China (0.28)
- Europe > United Kingdom (0.27)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report
- Industry:
- Banking & Finance > Trading (1.00)
- Energy
- Power Industry (1.00)
- Renewable > Wind (0.93)