884d247c6f65a96a7da4d1105d584ddd-Paper.pdf
–Neural Information Processing Systems
DDPG [24]extends Q-learning to continuous control based on the Deterministic Policy Gradient [31] algorithm, which learns a deterministic policyπ(s;φ) parameterized byφto maximize the Q-function to approximate themaxoperator.
Neural Information Processing Systems
Feb-9-2026, 06:33:31 GMT