Reinforcement Learning with Imperfect Transition Predictions: A Bellman-Jensen Approach
Lu, Chenbei, Chen, Zaiwei, Li, Tongxin, Wu, Chenye, Wierman, Adam
–arXiv.org Artificial Intelligence
Traditional reinforcement learning (RL) assumes the agents make decisions based on Markov decision processes (MDPs) with one-step transition models. In many real-world applications, such as energy management and stock investment, agents can access multi-step predictions of future states, which provide additional advantages for decision making. However, multi-step predictions are inherently high-dimensional: naively embedding these predictions into an MDP leads to an exponential blow-up in state space and the curse of dimensionality. Moreover, existing RL theory provides few tools to analyze prediction-augmented MDPs, as it typically works on one-step transition kernels and cannot accommodate multi-step predictions with errors or partial action-coverage. We address these challenges with three key innovations: First, we propose the \emph{Bayesian value function} to characterize the optimal prediction-aware policy tractably. Second, we develop a novel \emph{Bellman-Jensen Gap} analysis on the Bayesian value function, which enables characterizing the value of imperfect predictions. Third, we introduce BOLA (Bayesian Offline Learning with Online Adaptation), a two-stage model-based RL algorithm that separates offline Bayesian value learning from lightweight online adaptation to real-time predictions. We prove that BOLA remains sample-efficient even under imperfect predictions. We validate our theory and algorithm on synthetic MDPs and a real-world wind energy storage control problem.
arXiv.org Artificial Intelligence
Oct-22-2025
- Country:
- Asia
- China
- Guangdong Province > Shenzhen (0.04)
- Hong Kong (0.04)
- Middle East > Jordan (0.04)
- China
- Europe > United Kingdom
- England
- Cambridgeshire > Cambridge (0.04)
- Greater London > London (0.04)
- England
- North America > United States
- California (0.04)
- Massachusetts > Hampshire County
- Amherst (0.04)
- Asia
- Genre:
- Research Report (1.00)
- Industry:
- Banking & Finance > Trading (0.93)
- Energy
- Power Industry (1.00)
- Renewable > Wind (0.88)