Improving Model-Based Reinforcement Learning by Converging to Flatter Minima

Jun-13-2026, 17:13:29 GMT–Neural Information Processing Systems

Model-based reinforcement learning (MBRL) hinges on a learned dynamics model whose errors can compound along imagined rollouts. We study how encouraging \emph{flatness} in the model's training loss affects downstream control, and show that steering optimization toward flatter minima yields a better policy. Concretely, we integrate \emph{Sharpness-Aware Minimization} (SAM) into world-model training as a drop-in objective, leaving the planner and policy components unchanged. On the theory side, we derive PAC-Bayesian bounds that link first-order sharpness to the value-estimation gap and the performance gap between model-optimal and true-optimal policies, implying that flatter minima tighten both. Empirically, SAM reduces measured sharpness and value-prediction error and improves returns across HumanoidBench, Atari-100k, and high-DoF DeepMind Control tasks. Augmenting existing MBRL algorithms with SAM increases mean return, with especially large gains in settings with high dimensional state-action space. We further observe positive transfer across algorithms and input modalities, including a transformer-based world-model.

large language model, machine learning, natural language, (8 more...)

Neural Information Processing Systems

Jun-13-2026, 17:13:29 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.60)
  - Machine Learning > Neural Networks
    - Deep Learning (0.60)