Learning Continuous Control Policies by Stochastic Value Gradients

Nicolas Heess, Gregory Wayne, David Silver, Timothy Lillicrap, Tom Erez, Yuval Tassa

Oct-2-2025, 00:43:44 GMT–Neural Information Processing Systems

We present a unified framework for learning continuous control policies using backpropagation. It supports stochastic control by treating stochasticity in the Bellman equation as a deterministic function of exogenous noise. The product is a spectrum of general policy gradient algorithms that range from model-free methods with value functions to model-based methods without value functions. We use learned models but only require observations from the environment instead of observations from model-predicted trajectories, minimizing the impact of compounded model errors. We apply these algorithms first to a toy stochastic control problem and then to several physics-based control problems in simulation. One of these variants, SVG(1), shows the effectiveness of learning models, value functions, and policies simultaneously in continuous domains.

algorithm, svg, value function, (16 more...)

Neural Information Processing Systems

Oct-2-2025, 00:43:44 GMT

Conferences PDF

Add feedback

Country:
- Europe
  - Netherlands > South Holland
    - Delft (0.04)
  - France > Auvergne-Rhône-Alpes
    - Isère > Grenoble (0.04)
- Asia > Middle East
  - Jordan (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (0.95)
    - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
Learning Continuous Control Policies by Stochastic Value Gradients Nicolas Heess, Greg Wayne
Learning Continuous Control Policies by Stochastic Value Gradients

Similar Docs Excel Report more

Title	Similarity	Source
None found