AITopics | actor-critic

Collaborating Authors

actor-critic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning

Neural Information Processing SystemsMar-20-2026, 20:45:50 GMT

We propose WSAC (Weighted Safe Actor-Critic), a novel algorithm for Safe Offline Reinforcement Learning (RL) under functional approximation, which can robustly optimize policies to improve upon an arbitrary reference policy with limited data coverage. WSAC is designed as a two-player Stackelberg game to optimize a refined objective function. The actor optimizes the policy against two adversarially trained value critics with small importance-weighted Bellman errors, which focus on scenarios where the actor's performance is inferior to the reference policy. In theory, we demonstrate that when the actor employs a no-regret optimization oracle, WSAC achieves a number of guarantees: $(i)$ For the first time in the safe offline RL setting, we establish that WSAC can produce a policy that outperforms {\bf any} reference policy while maintaining the same level of safety, which is critical to designing a safe algorithm for offline RL. $(ii)$ WSAC achieves the optimal statistical convergence rate of $1/\sqrt{N}$ to the reference policy, where $N$ is the size of the offline dataset.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.42)

Add feedback

Cognitive Model Discovery via Disentangled RNNs

Neural Information Processing SystemsFeb-16-2026, 22:27:42 GMT

Computational cognitive models are a fundamental tool in behavioral neuroscience.

artificial intelligence, machine learning, simulation of human behavior, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.67)
Workflow (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.73)

Add feedback

Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Neural Information Processing SystemsDec-25-2025, 18:26:19 GMT

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor-critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.

actor-critic, linear quadratic regulator, provably global convergence, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic

Neural Information Processing SystemsDec-24-2025, 09:58:10 GMT

Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years. However, most of the existing theoretical support for AC algorithms focuses on the case of linear function approximations, or linearized neural networks, where the feature representation is fixed throughout training. Such a limitation fails to capture the key aspect of representation learning in neural AC, which is pivotal in practical problems. In this work, we take a mean-field perspective on the evolution and convergence of feature-based neural AC. Specifically, we consider a version of AC where the actor and critic are represented by overparameterized two-layer neural networks and are updated with two-timescale learning rates. The critic is updated by temporal-difference (TD) learning with a larger stepsize while the actor is updated via proximal policy optimization (PPO) with a smaller stepsize. In the continuous-time and infinite-width limiting regime, when the timescales are properly separated, we prove that neural AC finds the globally optimal policy at a sublinear rate. Additionally, we prove that the feature representation induced by the critic network is allowed to evolve within a neighborhood of the initial one.

mean-field analysis, representation learning, wasserstein flow meet replicator dynamic, (9 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.07)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

c194ced51c857ec2c1928b02250e0ac8-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 06:31:04 GMT

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.67)
Workflow (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Comparative Analysis of Parameterized Action Actor-Critic Reinforcement Learning Algorithms for Web Search Match Plan Generation

Bapoo, Ubayd, Nyirenda, Clement N

arXiv.org Artificial IntelligenceOct-6-2025

This study evaluates the performance of Soft Actor Critic (SAC), Greedy Actor Critic (GAC), and Truncated Quantile Critics (TQC) in high-dimensional decision-making tasks using fully observable environments. The focus is on parametrized action (PA) spaces, eliminating the need for recurrent networks, with benchmarks Platform-v0 and Goal-v0 testing discrete actions linked to continuous action-parameter spaces. Hyperparameter optimization was performed with Microsoft NNI, ensuring reproducibility by modifying the codebase for GAC and TQC. Results show that Parameterized Action Greedy Actor-Critic (PAGAC) outperformed other algorithms, achieving the fastest training times and highest returns across benchmarks, completing 5,000 episodes in 41:24 for the Platform game and 24:04 for the Robot Soccer Goal game. Its speed and stability provide clear advantages in complex action spaces. Compared to PASAC and PATQC, PAGAC demonstrated superior efficiency and reliability, making it ideal for tasks requiring rapid convergence and robust performance. Future work could explore hybrid strategies combining entropy-regularization with truncation-based methods to enhance stability and expand investigations into generalizability.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2510.03064

Country:

Europe (0.46)
Africa > South Africa (0.29)
North America > United States (0.28)

Genre: Research Report > New Finding (0.49)

Industry: Leisure & Entertainment > Sports > Soccer (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Real-Time Reinforcement Learning

Simon Ramstedt, Chris Pal

Neural Information Processing SystemsOct-2-2025, 18:16:25 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)

Add feedback

Projected Natural Actor-Critic

Neural Information Processing SystemsSep-30-2025, 12:58:17 GMT

Natural actor-critics are a popular class of policy search algorithms for finding locally optimal policies for Markov decision processes. In this paper we address a drawback of natural actor-critics that limits their real-world applicability - their lack of safety guarantees. We present a principled algorithm for performing natural gradient descent over a constrained domain. In the context of reinforcement learning, this allows for natural actor-critic algorithms that are guaranteed to remain within a known safe region of policy space. While deriving our class of constrained natural actor-critic algorithms, which we call Projected Natural Actor-Critics (PNACs), we also elucidate the relationship between natural gradient descent and mirror descent.

actor-critic, algorithm, natural actor-critic, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic

Neural Information Processing SystemsMay-26-2025, 23:12:54 GMT

artificial intelligence, machine learning, wasserstein flow meet replicator dynamic, (8 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.09)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.78)

Add feedback

Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns

Tian, Dong, Li, Ge, Zhou, Hongyi, Celik, Onur, Neumann, Gerhard

arXiv.org Artificial IntelligenceMar-6-2025

Unlike traditional methods that focus on evaluating single state-action pairs or apply action chunking in the actor network, this approach feeds chunked actions directly into the critic. Leveraging the Transformer's strength in processing sequential data, the proposed architecture achieves more robust value estimation. Empirical evaluations demonstrate that this method leads to efficient and stable training, particularly excelling in environments with sparse rewards or Multi-Phase tasks.Contribution(s) 1. We present a novel critic architecture for SAC that leverages Transformers to process sequential information, resulting in more accurate value estimations. Context: Transformer-Based Critic Network 2. We introduce a method for incorporating N-Step returns into the critic network in a stable and efficient manner, effectively mitigating the common challenges of variance and importance sampling associated with N-returns. Context: Stable Integration of N-Returns 3. We shift action chunking from the actor to the critic, demonstrating that enhanced temporal reasoning at the critic level--beyond traditional actor-side exploration--drives performance improvements in sparse and multi-phase tasks. Context: Unlike previous approaches that focus on actor-side chunking for exploration, our Transformer-based critic network produces a smooth value surface that is highly responsive to dataset variations, eliminating the need for additional exploration enhancements.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2503.0366

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback