AITopics | reinforcement

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Neural Information Processing SystemsMar-17-2026, 01:38:59 GMT

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments. We show several candidate policy update rules and relate them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. We apply our method to model-free multiagent reinforcement learning in adversarial sequential decision problems (zero-sum imperfect information games), using RL-style function approximation. We evaluate on commonly used benchmark Poker domains, showing performance against fixed policies and empirical convergence to approximate Nash equilibria in self-play with rates similar to or better than a baseline model-free algorithm for zero-sum games, without any domain-specific state space reductions.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Constrained Reinforcement Learning Has Zero Duality Gap

Santiago Paternain, Luiz Chamon, Miguel Calvo-Fullana, Alejandro Ribeiro

Neural Information Processing SystemsMar-13-2026, 08:55:51 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, dual problem, parametrization, (14 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Pennsylvania (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Regularizing Trajectory Optimization with Denoising Autoencoders

Rinu Boney, Norman Di Palo, Mathias Berglund, Alexander Ilin, Juho Kannala, Antti Rasmus, Harri Valpola

Neural Information Processing SystemsFeb-19-2026, 10:53:55 GMT

Neural Information Processing Systems http://nips.cc/

international conference, optimization, trajectory optimization, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

af1c25e88a9e818f809f6b5d18ca02e2-Paper.pdf

Neural Information Processing SystemsFeb-19-2026, 07:25:46 GMT

cognition, progression, trajectory, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.05)
Europe > France (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

d3b1fb02964aa64e257f9f26a31f72cf-Paper.pdf

Neural Information Processing SystemsFeb-19-2026, 07:24:22 GMT

algorithm, fmdp, mdp, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

32285dd184dbfc33cb2d1f0db53c23c5-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 06:21:01 GMT

dpmorl, return distribution, utility function, (14 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Provably Mitigating Overoptimization in RLHF: Y our SFT Loss is Implicitly an Adversarial Regularizer

Neural Information Processing SystemsFeb-18-2026, 18:53:13 GMT

Then it fine-tunes the LLM to maximize the learned reward using RL techniques.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

f18222190deda602622b3ed90d28bd98-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 16:21:25 GMT

algorithm, lemma 3, occupancy measure, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)

Add feedback

Many-shot Jailbreaking

Neural Information Processing SystemsFeb-18-2026, 13:56:58 GMT

Longer contexts present a new attack surface for adversarial attacks. In search of a "fruit-fly" of long-context vulnerabilities, we study Many-shot Jailbreaking (MSJ; Figure 1), a simple yet effective and scalable jailbreak.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: