AITopics | optimal value function

Collaborating Authors

optimal value function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Generative Modeling by Value-Driven Transport

Moreno-Muñoz, Pablo, Müller, Adrian, Neu, Gergely

arXiv.org Machine LearningMay-22-2026

We propose a new framework for generative modeling based on a discrete-time stochastic control formulation of measure transport. Adapting classic results from control theory, we formulate our problem as a linear program whose dual variables correspond to the \emph{optimal value function} of the control problem, which directly encodes the optimal control policy. Exploiting this LP formulation, we develop an efficient simulation-free primal-dual algorithm for computing approximately optimal value functions and the associated \emph{value-driven transport} (VDT) policies which approximate the true optimal policy. We show that well-trained VDT policies enjoy numerous favorable properties in comparison with other state-of-the-art methods based on flows, diffusions, or Schrödinger bridges: they lead to straight transport paths which can be simulated quickly and robustly, and can be enhanced in all the same ways as diffusion and flow-based models (e.g., conditional generation, classifier-free guidance, unpaired data-to-data translation are all easy to incorporate). We evaluate our methodology in a range of experiments, with results that indicate strong performance and good potential for scalability.

artificial intelligence, machine learning, src, (18 more...)

arXiv.org Machine Learning

2605.22507

Country:

Europe (0.92)
North America > United States (0.67)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)

Add feedback

rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions

Neural Information Processing SystemsMar-13-2026, 18:43:59 GMT

Many state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a "fully observable" problem--a belief MDP--and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex). This approach has been extended to solving ρ-POMDPs--i.e., for information-oriented criteria--when the reward ρ is convex in . General ρ-POMDPs can also be turned into "fully observable" problems, but with no means to exploit the PWLC property. In this paper, we focus on POMDPs and ρ-POMDPs with λ ρ -Lipschitz reward function, and demonstrate that, for finite horizons, the optimal value function is Lipschitz-continuous. Then, value function approximators are proposed for both upper-and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. This allows proposing two algorithms derived from HSVI which are empirically evaluated on various benchmark problems.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Online POMDP Planning with Anytime Deterministic Guarantees

Neural Information Processing SystemsFeb-18-2026, 03:01:41 GMT

Autonomous agents operating in real-world scenarios frequently encounter uncertainty and make decisions based on incomplete information.

artificial intelligence, machine learning, planning & scheduling, (20 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.67)

Add feedback

Efficient Potential-based Exploration in Reinforcement Learning using Inverse Dynamic Bisimulation Metric

Neural Information Processing SystemsFeb-15-2026, 05:07:01 GMT

While a number of RL methods have been proposed to boost exploration by designing an intrinsic reward signal as exploration bonus.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Macao (0.14)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Hong Kong (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback

rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions

Mathieu Fehr, Olivier Buffet, Vincent Thomas, Jilles Dibangoye

Neural Information Processing SystemsFeb-14-2026, 22:17:39 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, lipschitz constant, pomdp, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

0fe6a18be9491139fb759e2f645374b1-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 00:55:24 GMT

complexity, mdp, optimal policy, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.92)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
(2 more...)

Add feedback

Doubly-AsynchronousValueIteration: MakingValueIterationAsynchronousinActions

Neural Information Processing SystemsFeb-7-2026, 22:43:32 GMT

However, Asynchronous VI still requires a maximization over the entire action space, making it impractical for domains with large action space.

artificial intelligence, davi, probability, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Technology: Information Technology > Artificial Intelligence (0.95)

Add feedback

A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning

Neural Information Processing SystemsDec-24-2025, 00:11:25 GMT

Temporal-difference learning is a popular algorithm for policy evaluation. In this paper, we study the convergence of the regularized non-parametric TD(0) algorithm, in both the independent and Markovian observation settings. In particular, when TD is performed in a universal reproducing kernel Hilbert space (RKHS), we prove convergence of the averaged iterates to the optimal value function, even when it does not belong to the RKHS. We provide explicit convergence rates that depend on a source condition relating the regularity of the optimal value function to the RKHS. We illustrate this convergence numerically on a simple continuous-state Markov reward process.

name change, non-asymptotic analysis, non-parametric temporal-difference learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

Add feedback

Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis

Davidovich, Orit, Shtern, Shimrit, Wasserkrug, Segev, Megiddo, Nimrod

arXiv.org Machine LearningDec-10-2025

Since the 1990s, considerable empirical work has been carried out to train statistical models, such as neural networks (NNs), as learned heuristics for combinatorial optimization (CO) problems. When successful, such an approach eliminates the need for experts to design heuristics per problem type. Due to their structure, many hard CO problems are amenable to treatment through reinforcement learning (RL). Indeed, we find a wealth of literature training NNs using value-based, policy gradient, or actor-critic approaches, with promising results, both in terms of empirical optimality gaps and inference runtimes. Nevertheless, there has been a paucity of theoretical work undergirding the use of RL for CO problems. To this end, we introduce a unified framework to model CO problems through Markov decision processes (MDPs) and solve them using RL techniques. We provide easy-to-test assumptions under which CO problems can be formulated as equivalent undiscounted MDPs that provide optimal solutions to the original CO problems. Moreover, we establish conditions under which value-based RL techniques converge to approximate solutions of the CO problem with a guarantee on the associated optimality gap. Our convergence analysis provides: (1) a sufficient rate of increase in batch size and projected gradient descent steps at each RL iteration; (2) the resulting optimality gap in terms of problem parameters and targeted RL accuracy; and (3) the importance of a choice of state-space embedding. Together, our analysis illuminates the success (and limitations) of the celebrated deep Q-learning algorithm in this problem context.

co problem, theorem 3, value function, (16 more...)

arXiv.org Machine Learning

2512.08601

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > France (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Filters

Collaborating Authors

optimal value function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Generative Modeling by Value-Driven Transport

rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions

fc6bd0eef19459655d5b097af783661d-Supplemental-Conference.pdf

Online POMDP Planning with Anytime Deterministic Guarantees

Efficient Potential-based Exploration in Reinforcement Learning using Inverse Dynamic Bisimulation Metric

rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions

0fe6a18be9491139fb759e2f645374b1-Paper-Conference.pdf

Doubly-AsynchronousValueIteration: MakingValueIterationAsynchronousinActions

A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning

Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis