AITopics | bellman equation

Expert demonstrations, such as those from car drivers, help navigate environments with unknown rewards, but are often collected in controlled settings, such as closed-course test tracks, while learned control policies must be deployed in new environments, such as city streets. We can imitate experts to perform well in the same source environment where demonstrations are observed, and we may even use inverse reinforcement learning (IRL) to improve on simple behavior cloning (Ng and Russell, 2000; Abbeel and Ng, 2004; Ziebart et al., 2008; Fu et al., 2018; Geng et al., 2020). But the target environment may have a different transition law, discount factor, or soft-control regularization. For this, IRL is crucial: we can learn a reward from demonstrations in the source environment and transfer it to the target environment, learning a policy that optimizes the same reward function in a new setting (Fu et al., 2018; Schlaginhaufen and Kamgarpour, 2024). In this paper, we characterize how well this transfer can be done and which approaches are preferable. In particular, we show the value in a coupled approach that takes the target environment into account even when learning from the source. In ordinary offline control, the Bellman equation uses a known reward, so the main statistical error comes from target transitions.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2605.27834

Genre: Research Report (0.63)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

Neural Information Processing SystemsMar-22-2026, 20:12:09 GMT

We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions in the generative model regime (up to logarithmic factors), the first result of this kind for any distributional RL algorithm. Our analysis also provides new theoretical perspectives on categorical approaches to distributional RL, as well as introducing a new distributional Bellman equation, the stochastic categorical CDF Bellman equation, which we expect to be of independent interest. Finally, we provide an experimental study comparing a variety of model-based distributional RL algorithms, with several key takeaways for practitioners.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

Add feedback

ef7dd4a0e8622530ce87ae7d2265aa4a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 15:30:17 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Data Science (0.68)

Add feedback

ea6d17af54f827336fc8fed27ca0319d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 13:57:30 GMT

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

Neural Information Processing SystemsFeb-17-2026, 14:54:12 GMT

We conduct a novel theoretical analysis to demonstrate CODA's

machine learning, natural language, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

e2c4a40d50b47094f571e40efead3900-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-14-2026, 17:22:17 GMT

assumption, formula, mean square error, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)

Add feedback

FlowNetworkbasedGenerativeModelsfor Non-IterativeDiverseCandidateGeneration

Neural Information Processing SystemsFeb-11-2026, 15:58:01 GMT

This paper is about the problem of learning a stochastic policy for generating an object (like a molecular graph) from a sequence of actions, such that the probability of generating an object isproportional to agiven positivereward for that object.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

fdc42b6b0ee16a2f866281508ef56730-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 06:27:52 GMT

algorithm, algorithm 1, step hold, (13 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

c058f544c737782deacefa532d9add4c-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 23:48:00 GMT

algorithm, differential q-learning, formulation, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Appendix: Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms

Neural Information Processing SystemsFeb-10-2026, 18:58:41 GMT

Thus the optimal average reward of the original MDP and modified MDP differ by O ( ϵ). To ensure Assumption 3.1 (b) is satisfied, an aperiodicity transformation can be implemented. The proof of this theorem can be found in [Sch71]. From Lemma 2.2, we thus have, ( J In order to iterate Equation (8), need to ensure the terms are non-negative. Theorem 3.3 presents an upper bound on the error in terms of the average reward.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

bellman equation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Reward Transfer from Inverse Reinforcement Learning: A Coupled Minimax Approach

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

ef7dd4a0e8622530ce87ae7d2265aa4a-Paper-Conference.pdf

ea6d17af54f827336fc8fed27ca0319d-Paper-Conference.pdf

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

e2c4a40d50b47094f571e40efead3900-AuthorFeedback.pdf

FlowNetworkbasedGenerativeModelsfor Non-IterativeDiverseCandidateGeneration

fdc42b6b0ee16a2f866281508ef56730-Supplemental.pdf

c058f544c737782deacefa532d9add4c-Paper.pdf

Appendix: Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms