AITopics | state-action space

On the Global Optimality of Policy Gradient Methods in General Utility Reinforcement Learning

Neural Information Processing SystemsJun-12-2026, 18:47:41 GMT

Reinforcement learning with general utilities (RLGU) offers a unifying framework to capture several problems beyond standard expected returns, including imitation learning, pure exploration, and safe RL. Despite recent fundamental advances in the theoretical analysis of policy gradient (PG) methods for standard RL and recent efforts in RLGU, the understanding of these PG algorithms and their scope of application in RLGU still remain limited. In this work, we establish global optimality guarantees of PG methods for RLGU in which the objective is a general concave utility function of the state-action occupancy measure. In the tabular setting, we provide global optimality results using a new proof technique building on recent theoretical developments on the convergence of PG methods for standard RL using gradient domination. Our proof technique opens avenues for analyzing policy parameterizations beyond the direct policy parameterization for RLGU. In addition, we provide global optimality results for large state-action space settings beyond prior work which has mostly focused on the tabular setting. In this large scale setting, we adapt PG methods by approximating occupancy measures within a function approximation class using maximum likelihood estimation. Our sample complexity only scales with the dimension induced by our approximation class instead of the size of the state-action space.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.59)

Add feedback

Improved Model-based Reinforcement Learning with Smooth Kernels

Long, Kun, Li, Yuqiang, Wu, Xianyi

arXiv.org Machine LearningMay-11-2026

For continuous state-action space scenarios, classical reinforcement learning (RL) theory predominantly focuses on low-rank Markov decision processes (MDPs), which provide sample-efficient guarantees at the expense of restrictive structural assumptions. Kernel smoothing model-based approaches offer a promising alternative paradigm that instead leverages the smoothness of the MDP and employs non-parametric kernel smoothing estimates of transition dynamics. This paper proposes a new kernel-smoothing model-based approach for online reinforcement learning in finite-horizon settings under Lipschitz continuity assumptions on the MDP. By incorporating a Bernstein-style exploration bonus into the kernel smoothing framework, our method achieves a regret bound which improves upon the state-of-the-art regret bound in its dependence on the horizon. The theoretical advancement relies on a delicate analysis of the synergy between Bernstein-style bonuses and kernel smoothing, where a new tight Bernstein-type concentration inequality for martingales may be of independent interest.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2605.07218

Country: Asia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

437d46a857214c997956eaf0e3b21a55-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 15:32:06 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Robots (0.93)

Add feedback

Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs

Neural Information Processing SystemsFeb-16-2026, 11:46:14 GMT

Existing solutions either work under very specific assumptions or achieve bounds that are vacuous in some regimes.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Lombardy > Milan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

We introduce a novel framework for analyzing reinforcement learning (RL) in continuous state-action spaces, and use it to prove fast rates of convergence in both

Neural Information Processing SystemsFeb-16-2026, 06:18:22 GMT

We argue that these properties are satisfied in many continuous state-action Markov decision processes.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.67)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

437d46a857214c997956eaf0e3b21a55-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 09:47:15 GMT

algorithm, mdp, proximity, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)

Add feedback

Appendix: OntheExpressivityofMarkovReward

Neural Information Processing SystemsFeb-8-2026, 08:57:26 GMT

Instead, wesuggest that foragivenCMP,it is natural to be interested in Markov rewards, but acknowledge the importance of going beyond such functions.

artificial intelligence, constraint, reward function, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.57)

Add feedback

POLY-HOOT: Monte-CarloPlanninginContinuous SpaceMDPswithNon-AsymptoticAnalysis

Neural Information Processing SystemsFeb-8-2026, 00:03:03 GMT

Inthis paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, amuchlessunderstood problem withimportant applications in control and robotics.

algorithm, artificial intelligence, planning & scheduling, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.89)

Add feedback

We thank the reviewers for the comments and constructive feedback and we are delighted that they appreciated the

Neural Information Processing SystemsFeb-7-2026, 19:34:23 GMT

RL-as-inference (see discussion in Section 4.3), they differ crucially in how the objective is interpreted.

artificial intelligence, comment and constructive feedback, practical algorithm, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.73)

Add feedback

Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Neural Information Processing SystemsDec-23-2025, 21:33:04 GMT

Modern reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose $\pi$-KRVI, an optimistic modification of least-squares value iteration, when the action-value function is represented by an RKHS. We prove the first order-optimal regret guarantees under a general setting. Our results show a significant polynomial in the number of episodes improvement over the state of the art. In particular, with highly non-smooth kernels (such as Neural Tangent kernel or some Matérn kernels) the existing results lead to trivial (superlinear in the number of episodes) regret bounds. We show a sublinear regret bound that is order optimal in the cases where a lower bound on regret is known (which includes the kernels mentioned above).

kernelized reinforcement learning, name change, order optimal regret bound, (4 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Filters

Collaborating Authors

state-action space

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

On the Global Optimality of Policy Gradient Methods in General Utility Reinforcement Learning

Improved Model-based Reinforcement Learning with Smooth Kernels

437d46a857214c997956eaf0e3b21a55-Supplemental.pdf

Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs

We introduce a novel framework for analyzing reinforcement learning (RL) in continuous state-action spaces, and use it to prove fast rates of convergence in both

437d46a857214c997956eaf0e3b21a55-Supplemental.pdf

Appendix: OntheExpressivityofMarkovReward

POLY-HOOT: Monte-CarloPlanninginContinuous SpaceMDPswithNon-AsymptoticAnalysis

We thank the reviewers for the comments and constructive feedback and we are delighted that they appreciated the

Kernelized Reinforcement Learning with Order Optimal Regret Bounds