Temporal Difference Flows

Farebrother, Jesse, Pirotta, Matteo, Tirinzoni, Andrea, Munos, Rémi, Lazaric, Alessandro, Touati, Ahmed

Mar-12-2025–arXiv.org Machine Learning

Predictive models of the future are fundamental for an agent's ability to reason and plan. A common strategy learns a world model and unrolls it step-by-step at inference, where small errors can rapidly compound. Geometric Horizon Models (GHMs) offer a compelling alternative by directly making predictions of future states, avoiding cumulative inference errors. While GHMs can be conveniently learned by a generative analog to temporal difference (TD) learning, existing methods are negatively affected by bootstrapping predictions at train time and struggle to generate high-quality predictions at long horizons. This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods. Theoretically, we establish a new convergence result and primarily attribute TD-Flow's efficacy to reduced gradient variance during training. We further show that similar arguments can be extended to diffusion-based methods. Empirically, we validate TD-Flow across a diverse set of domains on both generative metrics and downstream tasks including policy evaluation. Moreover, integrating TD-Flow with recent behavior foundation models for planning over pre-trained policies demonstrates substantial performance gains, underscoring its promise for long-horizon decision-making.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

Mar-12-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Quebec (0.14)
  - United States (0.14)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Energy > Oil & Gas > Upstream (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)