Policy Gradient With Serial Markov Chain Reasoning

Dec-24-2025, 01:26:11 GMT–Neural Information Processing Systems

We introduce a new framework that performs decision-making in reinforcement learning (RL) as an iterative reasoning process.

name change, policy gradient, serial markov chain reasoning, (6 more...)

Neural Information Processing Systems

Dec-24-2025, 01:26:11 GMT

Conferences Web Page

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.84)
  - Representation & Reasoning (0.66)
  - Cognitive Science > Problem Solving (0.45)