884d247c6f65a96a7da4d1105d584ddd-Paper.pdf

Neural Information Processing Systems 

DDPG [24]extends Q-learning to continuous control based on the Deterministic Policy Gradient [31] algorithm, which learns a deterministic policyπ(s;φ) parameterized byφto maximize the Q-function to approximate themaxoperator.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found