Reinforcement Learning with Imperfect Transition Predictions: A Bellman-Jensen Approach

Lu, Chenbei, Chen, Zaiwei, Li, Tongxin, Wu, Chenye, Wierman, Adam

Oct-22-2025–arXiv.org Artificial Intelligence

Traditional reinforcement learning (RL) assumes the agents make decisions based on Markov decision processes (MDPs) with one-step transition models. In many real-world applications, such as energy management and stock investment, agents can access multi-step predictions of future states, which provide additional advantages for decision making. However, multi-step predictions are inherently high-dimensional: naively embedding these predictions into an MDP leads to an exponential blow-up in state space and the curse of dimensionality. Moreover, existing RL theory provides few tools to analyze prediction-augmented MDPs, as it typically works on one-step transition kernels and cannot accommodate multi-step predictions with errors or partial action-coverage. We address these challenges with three key innovations: First, we propose the \emph{Bayesian value function} to characterize the optimal prediction-aware policy tractably. Second, we develop a novel \emph{Bellman-Jensen Gap} analysis on the Bayesian value function, which enables characterizing the value of imperfect predictions. Third, we introduce BOLA (Bayesian Offline Learning with Online Adaptation), a two-stage model-based RL algorithm that separates offline Bayesian value learning from lightweight online adaptation to real-time predictions. We prove that BOLA remains sample-efficient even under imperfect predictions. We validate our theory and algorithm on synthetic MDPs and a real-world wind energy storage control problem.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Oct-22-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China
    - Guangdong Province > Shenzhen (0.04)
    - Hong Kong (0.04)
  - Middle East > Jordan (0.04)
- Europe > United Kingdom
  - England
    - Cambridgeshire > Cambridge (0.04)
    - Greater London > London (0.04)
- North America > United States
  - California (0.04)
  - Massachusetts > Hampshire County
    - Amherst (0.04)

Genre:
- Research Report (1.00)

Industry:
- Banking & Finance > Trading (0.93)
- Energy
  - Power Industry (1.00)
  - Renewable > Wind (0.88)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.66)
  - Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found