AITopics | Wu, Weichen

Collaborating Authors

Wu, Weichen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Uncertainty quantification for Markov chains with application to temporal difference learning

Wu, Weichen, Wei, Yuting, Rinaldo, Alessandro

arXiv.org Machine LearningFeb-19-2025

Markov chains are fundamental to statistical machine learning, underpinning key methodologies such as Markov Chain Monte Carlo (MCMC) sampling and temporal difference (TD) learning in reinforcement learning (RL). Given their widespread use, it is crucial to establish rigorous probabilistic guarantees on their convergence, uncertainty, and stability. In this work, we develop novel, high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains, addressing key limitations in existing theoretical tools for handling dependent data. We leverage these results to analyze the TD learning algorithm, a widely used method for policy evaluation in RL. Our analysis yields a sharp high-probability consistency guarantee that matches the asymptotic variance up to logarithmic factors. Furthermore, we establish a $O(T^{-\frac{1}{4}}\log T)$ distributional convergence rate for the Gaussian approximation of the TD estimator, measured in convex distance. These findings provide new insights into statistical inference for RL algorithms, bridging the gaps between classical stochastic approximation theory and modern reinforcement learning applications.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

2502.13822

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Statistical Inference for Temporal Difference Learning with Linear Function Approximation

Wu, Weichen, Li, Gen, Wei, Yuting, Rinaldo, Alessandro

arXiv.org Machine LearningOct-21-2024

Statistical inference tasks, such as constructing confidence regions or simultaneous confidence intervals, are often addressed by deriving distributional theory such as central limit theorems (CLTs) for the estimator in use. Due to the high dimensionality of modern science and engineering applications, there has been a surge of interests in deriving convergence results that are valid in a finite-sample manner. In Reinforcement Learning (RL), a discipline that underpins many recent machine learning breakthroughs (Agarwal et al. (2019); Sutton and Barto (2018)) one central question is to evaluate with confidence the quality of a given policy, measured by its value function. As RL is often modeled as decision making in Markov decision processes (MDPs), the task of statistical inference needs to accommodate the online nature of the sampling mechanism. Temporal Difference (TD) learning (Sutton (1988)) is arguably the most widely used algorithm designed for this purpose. TD learning, which is an instance of stochastic approximation (SA) method (Robbins and Monro (1951)), approximates the value function of a given policy in an iterative manner. Towards understanding the non-asymptotic convergence rate of TD to the target value function, extensive recent efforts have been made (see, e.g.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

2410.16106

Country:

North America > United States > Texas (0.27)
North America > United States > Pennsylvania (0.27)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Sharp high-probability sample complexities for policy evaluation with linear function approximation

Li, Gen, Wu, Weichen, Chi, Yuejie, Ma, Cong, Rinaldo, Alessandro, Wei, Yuting

arXiv.org Artificial IntelligenceMay-30-2023

This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms: the temporal difference (TD) learning algorithm and the two-timescale linear TD with gradient correction (TDC) algorithm. In both the on-policy setting, where observations are generated from the target policy, and the off-policy setting, where samples are drawn from a behavior policy potentially different from the target policy, we establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level. We also exhihit an explicit dependence on problem-related quantities, and show in the on-policy setting that our upper bound matches the minimax lower bound on crucial problem parameters, including the choice of the feature maps and the problem dimension.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2305.19001

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback