AITopics | policy optimization algorithm

0b13c22ca208bc08f3fd13793292f25f-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 15:49:01 GMT

machine learning, natural language, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

e-COP: Episodic Constrained Optimization of Policies

Neural Information Processing SystemsFeb-18-2026, 03:01:30 GMT

Through extensive empirical analysis using benchmarks in the Safety Gym suite, we show that our algorithm has similar or better performance than SoT A (non-episodic) algorithms adapted for the episodic setting.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Asia > Middle East > Jordan (0.04)
Asia > India (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Transportation > Ground > Road (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DAC: The Double Actor-Critic Architecture for Learning Options

Shangtong Zhang, Shimon Whiteson

Neural Information Processing SystemsFeb-12-2026, 03:26:23 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, master policy, policy optimization algorithm, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Systems & Languages > Problem-Specific Architectures (0.40)

Add feedback

8951f484e8242b7f74817fdc390dd954-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 14:12:02 GMT

algorithm, algorithm 1, negap, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.45)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

a12f69495f41bb3b637ba1b6238884d6-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 08:44:27 GMT

algorithm, exp, probability, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

0b13c22ca208bc08f3fd13793292f25f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 16:46:25 GMT

algorithm, policy optimization algorithm, sample complexity, (9 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Neural Information Processing SystemsDec-24-2025, 17:31:46 GMT

We begin by proposing an algorithm framework for two-player zero-sum Markov Games in the full-information setting, where each iteration consists of a policy update step at each state using a certain matrix game algorithm, and a value update step with a certain learning rate.

algorithm, markov game, policy optimization, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Empirical Likelihood for Contextual Bandits

Neural Information Processing SystemsDec-24-2025, 03:57:24 GMT

We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting. To this end we apply empirical likelihood techniques to formulate our estimator and confidence interval as simple convex optimization problems. Using the lower bound of our confidence interval, we then propose an off-policy policy optimization algorithm that searches for policies with large reward lower bound. We empirically find that both our estimator and confidence interval improve over previous proposals in finite sample regimes. Finally, the policy optimization algorithm we propose outperforms a strong baseline system for learning from off-policy data.

confidence interval, empirical likelihood, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

Neural Information Processing SystemsDec-23-2025, 20:41:49 GMT

While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited---they are either restricted to tabular MDPs or suffer from highly suboptimal sample complexity, especial in online RL where exploration is necessary. This paper proposes a simple efficient policy optimization framework---Optimistic NPG for online RL. Optimistic NPG can be viewed as simply combining of the classic natural policy gradient (NPG) algorithm [Kakade, 2001] with optimistic policy evaluation subroutines to encourage exploration. For $d$-dimensional linear MDPs, Optimistic NPG is computationally efficient, and learns an $\epsilon$-optimal policy within $\tilde{\mathcal{O}}(d^2/\epsilon^3)$ samples, which is the first computationally efficient algorithm whose sample complexity has the optimal dimension dependence $\tilde{\Theta}(d^2)$. It also improves over state-of-the-art results of policy optimization algorithms [Zanette et al., 2021] by a factor of $d$. For general function approximation that subsumes linear MDPs, Optimistic NPG, to our best knowledge, is also the first policy optimization algorithm that achieves the polynomial sample complexity for learning near-optimal policies.

optimistic natural policy gradient, optimistic npg, simple efficient policy optimization framework, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

e-COP: Episodic Constrained Optimization of Policies

Neural Information Processing SystemsOct-10-2025, 16:29:43 GMT

Through extensive empirical analysis using benchmarks in the Safety Gym suite, we show that our algorithm has similar or better performance than SoT A (non-episodic) algorithms adapted for the episodic setting.

algorithm, constraint, policy optimization algorithm, (13 more...)

Neural Information Processing Systems

Country: