AITopics | temporal difference learning

Collaborating Authors

temporal difference learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Zap Q-Learning

Adithya M Devraj, Sean Meyn

Neural Information Processing SystemsApr-23-2026, 03:09:47 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Neural Information Processing SystemsDec-25-2025, 02:10:16 GMT

Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks. Furthermore, on the Atari 2600 domain, it surpasses human performance on 38 of 49 games tested (achieving a median human normalised score of 2.09), and outperforms its closest RVF competitor, Bootstrapped DQN, on 36 of those.

exploration and uncertainty, successor uncertainty, temporal difference learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.82)

Add feedback

Reviews: Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Neural Information Processing SystemsJan-22-2025, 01:18:28 GMT

This paper proposes using Bayesian linear regression to get a posterior over successor features as a way of representing uncertainty, from which they sample for exploration. I found the characterization of Randomised Policy Iteration to be strange, as it only seems to apply to UBE but not bootstrapped DQN, With bootstrapped DQN, each model in the ensemble is a value function pertaining to a different policy, thus there is no single reference policy. The ensemble is trying to represent a distribution of optimal value functions, rather than value functions for a single reference policy. Proposition 1: In the case of neural networks, and function approximation in general, it is very unlikely that we will get a factored distribution, so this claim does not seem applicable in general. In fact, in general there should be very high correlation between the q-values between nearby states. Is this claim a direct response to UBE? Also the analysis fixes the policy to consider the distribution of value functions, but this seems to not be how posterior sampling is normally considered, but rather only the way UBE considers it.

posterior, temporal difference learning, value function, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Reviews: Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Neural Information Processing SystemsJan-22-2025, 01:18:18 GMT

From the discussion, the reviewers appreciated the precisions made in the rebuttal. They have indicated what they would like to see improved in a revised version, in particular a clearer presentation.

exploration and uncertainty, successor uncertainty, temporal difference learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Neural Information Processing SystemsOct-9-2024, 14:53:35 GMT

exploration and uncertainty, successor uncertainty, temporal difference learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Finite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP

Sangadi, Tejaram, Prashanth, L. A., Jagannathan, Krishna

arXiv.org Artificial IntelligenceJun-12-2024

In the standard reinforcement learning (RL) setting, the objective is to learn a policy that maximizes the value function, which is the expectation of the cumulative reward that is obtained over a finite or infinite time horizon. However, in several practical scenarios including finance, automated driving and drug testing, a risk sensitive learning paradigm assumes importance, wherein the value function, which is an expectation, needs to be traded off suitably with an appropriate risk metric associated with the reward distribution. One way to achieve this is to solve a constrained optimization problem with this risk metric as a constraint, and the value function as the objective. Variance is a popular risk measure, which is usually incorporated into a risk-sensitive optimization problem as a constraint, with the usual expected value as the objective. Such a mean-variance formulation was studied in the seminal work of Markowitz [10]. In the context of RL, mean-variance optimization has been considered in several previous works, cf.

finite time analysis, mean-variance, temporal difference learning, (12 more...)

arXiv.org Artificial Intelligence

2406.07892

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks

Ke, Zhifa, Wen, Zaiwen, Zhang, Junyu

arXiv.org Artificial IntelligenceMay-7-2024

Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these algorithms remains challenging due to the nonlinearity of the action-value approximation. In this paper, we develop an improved non-asymptotic analysis of the neural TD method with a general $L$-layer neural network. New proof techniques are developed and an improved new $\tilde{\mathcal{O}}(\epsilon^{-1})$ sample complexity is derived. To our best knowledge, this is the first finite-time analysis of neural TD that achieves an $\tilde{\mathcal{O}}(\epsilon^{-1})$ complexity under the Markovian sampling, as opposed to the best known $\tilde{\mathcal{O}}(\epsilon^{-2})$ complexity in the existing literature.

approximation, improved finite-time analysis, temporal difference learning, (12 more...)

arXiv.org Artificial Intelligence

2405.04017

Country:

Europe > Austria > Vienna (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Zap Q-Learning

Neural Information Processing SystemsMar-11-2024, 20:10:30 GMT

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scale update equation for the matrix gain sequence. The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > New York > Tompkins County > Ithaca (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Analytical Mean Squared Error Curves in Temporal Difference Learning

Neural Information Processing SystemsFeb-17-2024, 04:28:22 GMT

We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal difference value estimation algorithms change with offline updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its step(cid:173) size and eligibility trace parameters.

analytical mean squared error curve, temporal difference learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Finite-Sample Analysis of the Temporal Difference Learning

Samsonov, Sergey, Tiapkin, Daniil, Naumov, Alexey, Moulines, Eric

arXiv.org Machine LearningOct-22-2023

In this paper we consider the problem of obtaining sharp bounds for the performance of temporal difference (TD) methods with linear functional approximation for policy evaluation in discounted Markov Decision Processes. We show that a simple algorithm with a universal and instance-independent step size together with Polyak-Ruppert tail averaging is sufficient to obtain near-optimal variance and bias terms. We also provide the respective sample complexity bounds. Our proof technique is based on refined error bounds for linear stochastic approximation together with the novel stability result for the product of random matrices that arise from the TD-type recurrence.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2310.14286

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback