Value Improved Actor Critic Algorithms

Jun-17-2026, 01:37:50 GMT–Neural Information Processing Systems

To learn approximately optimal acting policies for decision problems, modern Actor Critic algorithms rely on deep Neural Networks (DNNs) to parameterize the acting policy and greedification operators to iteratively improve it. The reliance on DNNs suggests an improvement that is gradient based, which is per step much less greedy than the improvement possible by greedier operators such as the greedy update used by Q-learning algorithms. On the other hand, slow changes to the policy can also be beneficial for the stability of the learning process, resulting in a tradeoff between greedification and stability. To better address this tradeoff, we propose to decouple the acting policy from the policy evaluated by the critic. This allows the agent to separately improve the critic's policy (e.g.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Jun-17-2026, 01:37:50 GMT

Conferences PDF

Add feedback

Country:
- Europe > Netherlands (0.14)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.92)

Industry:
- Education (0.45)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks > Deep Learning (0.48)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found