AITopics | markovian sample

Collaborating Authors

markovian sample

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

a992995ef4f0439b258f2360dbb85511-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 18:05:56 GMT

algorithm, convergence error, vrtdc, (11 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > New York > Erie County > Buffalo (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Neural Information Processing SystemsDec-26-2025, 02:32:05 GMT

Gradient-based temporal difference (GTD) algorithms are widely used in off-policy learning scenarios. Among them, the two time-scale TD with gradient correction (TDC) algorithm has been shown to have superior performance. In contrast to previous studies that characterized the non-asymptotic convergence rate of TDC only under identical and independently distributed (i.i.d.) data samples, we provide the first non-asymptotic convergence analysis for two time-scale TDC under a non-i.i.d.\ Markovian sample path and linear function approximation. We show that the two time-scale TDC can converge as fast as O(log t/t^(2/3)) under diminishing stepsize, and can converge exponentially fast under constant stepsize, but at the cost of a non-vanishing error. We further propose a TDC algorithm with blockwisely diminishing stepsize, and show that it asymptotically converges with an arbitrarily small error at a blockwisely linear convergence rate. Our experiments demonstrate that such an algorithm converges as fast as TDC under constant stepsize, and still enjoys comparable accuracy as TDC under diminishing stepsize.

non-asymptotic analysis, stepsize, time-scale off-policy td learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.77)

Add feedback

Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

Neural Information Processing SystemsAug-15-2025, 16:32:31 GMT

Recently, several work proposed to apply the variance reduction technique developed in the stochastic optimization literature to reduce the variance of TD learning.

algorithm, convergence error, vrtdc, (11 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > New York > Erie County > Buffalo (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Reviews: Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Neural Information Processing SystemsFeb-11-2025, 23:43:01 GMT

All the reviewers recommended acceptance and, after consideration by the Senior AC and the Program Chairs, a recommendation for Accept (Poster) was settled on.

markovian sample, non-asymptotic analysis, time-scale off-policy td learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Reviews: Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Neural Information Processing SystemsJan-27-2025, 20:00:53 GMT

The results are new and important to the field, and the analysis in this setting seems nontrivial. In addition, the paper also develops a new variant of TDC under a blockwise diminishing stepsize, and proves it asymptotically convergent with an arbitrarily small training error at linear convergence rate. Extensive experiments demonstrate that the new TDC variant can converge as fast as vanilla TDC with constant stepsize, and at the same time it enjoys comparable accuracy as TDC with diminishing stepsize. Overall, the paper has both analytical as well as practical value. However, the following issues need to be addressed. Markovian sample path has been studied in e.g., [30,34].

non-asymptotic analysis, stepsize, time-scale off-policy td learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)

Add feedback

Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Neural Information Processing SystemsOct-11-2024, 04:10:13 GMT

Gradient-based temporal difference (GTD) algorithms are widely used in off-policy learning scenarios. Among them, the two time-scale TD with gradient correction (TDC) algorithm has been shown to have superior performance. In contrast to previous studies that characterized the non-asymptotic convergence rate of TDC only under identical and independently distributed (i.i.d.) data samples, we provide the first non-asymptotic convergence analysis for two time-scale TDC under a non-i.i.d.\ Markovian sample path and linear function approximation. We show that the two time-scale TDC can converge as fast as O(log t/t (2/3)) under diminishing stepsize, and can converge exponentially fast under constant stepsize, but at the cost of a non-vanishing error. We further propose a TDC algorithm with blockwisely diminishing stepsize, and show that it asymptotically converges with an arbitrarily small error at a blockwisely linear convergence rate.

non-asymptotic analysis, stepsize, time-scale off-policy td learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Xu, Tengyu, Zou, Shaofeng, Liang, Yingbin

Neural Information Processing SystemsMar-19-2020, 01:01:24 GMT

Gradient-based temporal difference (GTD) algorithms are widely used in off-policy learning scenarios. Among them, the two time-scale TD with gradient correction (TDC) algorithm has been shown to have superior performance. In contrast to previous studies that characterized the non-asymptotic convergence rate of TDC only under identical and independently distributed (i.i.d.) data samples, we provide the first non-asymptotic convergence analysis for two time-scale TDC under a non-i.i.d.\ Markovian sample path and linear function approximation. We show that the two time-scale TDC can converge as fast as O(log t/t (2/3)) under diminishing stepsize, and can converge exponentially fast under constant stepsize, but at the cost of a non-vanishing error. We further propose a TDC algorithm with blockwisely diminishing stepsize, and show that it asymptotically converges with an arbitrarily small error at a blockwisely linear convergence rate. Our experiments demonstrate that such an algorithm converges as fast as TDC under constant stepsize, and still enjoys comparable accuracy as TDC under diminishing stepsize.

non-asymptotic analysis, stepsize, time-scale off-policy td learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reanalysis of Variance Reduced Temporal Difference Learning

Xu, Tengyu, Wang, Zhe, Zhou, Yi, Liang, Yingbin

arXiv.org Machine LearningJan-10-2020

Temporal difference (TD) learning is a popular algorithm for policy evaluation in reinforcement learning, but the vanilla TD can substantially suffer from the inherent optimization variance. A variance reduced TD (VRTD) algorithm was proposed by Korda and La (2015), which applies the variance reduction technique directly to the online TD learning with Markovian samples. In this work, we first point out the technical errors in the analysis of VRTD in Korda and La (2015), and then provide a mathematically solid analysis of the non-asymptotic convergence of VRTD and its variance reduction performance. We show that VRTD is guaranteed to converge to a neighborhood of the fixed-point solution of TD at a linear convergence rate. Furthermore, the variance error (for both i.i.d.\ and Markovian sampling) and the bias error (for Markovian sampling) of VRTD are significantly reduced by the batch size of variance reduction in comparison to those of vanilla TD. As a result, the overall computational complexity of VRTD to attain a given accurate solution outperforms that of TD under Markov sampling and outperforms that of TD under i.i.d.\ sampling for a sufficiently small conditional number.

algorithm, conference paper, vrtd, (15 more...)

arXiv.org Machine Learning

2001.01898

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback