contractive stochastic approximation
Finite-Sample Analysis of Contractive Stochastic Approximation Using Smooth Convex Envelopes
Stochastic Approximation (SA) is a popular approach for solving fixed-point equations where the information is corrupted by noise. In this paper, we consider an SA involving a contraction mapping with respect to an arbitrary norm, and show its finite-sample error bounds while using different stepsizes. The idea is to construct a smooth Lyapunov function using the generalized Moreau envelope, and show that the iterates of SA have negative drift with respect to that Lyapunov function. Our result is applicable in Reinforcement Learning (RL). In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning [18]. Importantly, our construction results in only a logarithmic dependence of the convergence bound on the size of the state-space.
Review for NeurIPS paper: Finite-Sample Analysis of Contractive Stochastic Approximation Using Smooth Convex Envelopes
Summary and Contributions: This paper considers the fundamental problem of computing a fixed point of a stochastic contractive operator. Formally, this paper supposes that there is an operator H from R d to R d such that H(x) – x_* _c \leq \gamma x – x_* _c for some norm \cdot _c, some gamma \in (0, 1), and fixed _x_* and that given any point x can compute H(x) w where w is mean zero noise, the magnitude of which depends on x. The paper then considers the natural stochastic approximation (SA) algorithm x_k 1 x_k epsilon_k (H(x_k) – x_k w_k) for some sequences step sizes epsilon_k and stochastic mean-0 w_k, the norm of which depends on x_k. The paper provides general convergence bounds for this algorithm in terms of the norm in which H contracts and the bound on the noise vector w_k in terms of x_k. Formally, the paper supposes that conditioned up to iteration k, E w_k 2_n \leq A (1 x_k _n 2) for some norm and provides bounds on the convergence of the method.
Finite-Sample Analysis of Contractive Stochastic Approximation Using Smooth Convex Envelopes
Stochastic Approximation (SA) is a popular approach for solving fixed-point equations where the information is corrupted by noise. In this paper, we consider an SA involving a contraction mapping with respect to an arbitrary norm, and show its finite-sample error bounds while using different stepsizes. The idea is to construct a smooth Lyapunov function using the generalized Moreau envelope, and show that the iterates of SA have negative drift with respect to that Lyapunov function. Our result is applicable in Reinforcement Learning (RL). In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning [18].
A Concentration Bound for TD(0) with Function Approximation
Chandak, Siddharth, Borkar, Vivek S.
We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.
- Asia > India (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Essex (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)