finite-sample analysis
A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games
In this work, we study two-player zero-sum stochastic games and develop a variant of the smoothed best-response learning dynamics that combines independent learning dynamics for matrix games with the minimax value iteration for stochastic games. The resulting learning dynamics are payoff-based, convergent, rational, and symmetric between the two players.
Finite-Sample Analysis for SARSA with Linear Function Approximation
SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement learning. We investigate the SARSA algorithm with linear function approximation under the non-i.i.d.\ setting, where a single sample trajectory is available. With a Lipschitz continuous policy improvement operator that is smooth enough, SARSA has been shown to converge asymptotically. However, its non-asymptotic analysis is challenging and remains unsolved due to the non-i.i.d.
Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators
In TD-learning, off-policy sampling is known to be more practical than on-policy sampling, and by decoupling learning from data collection, it enables data reuse. It is known that policy evaluation has the interpretation of solving a generalized Bellman equation. In this paper, we derive finite-sample bounds for any general off-policy TD-like stochastic approximation algorithm that solves for the fixed-point of this generalized Bellman operator. Our key step is to show that the generalized Bellman operator is simultaneously a contraction mapping with respect to a weighted $\ell_p$-norm for each $p$ in $[1,\infty)$, with a common contraction factor. Off-policy TD-learning is known to suffer from high variance due to the product of importance sampling ratios.
Finite-Sample Analysis of Contractive Stochastic Approximation Using Smooth Convex Envelopes
Stochastic Approximation (SA) is a popular approach for solving fixed-point equations where the information is corrupted by noise. In this paper, we consider an SA involving a contraction mapping with respect to an arbitrary norm, and show its finite-sample error bounds while using different stepsizes. The idea is to construct a smooth Lyapunov function using the generalized Moreau envelope, and show that the iterates of SA have negative drift with respect to that Lyapunov function. Our result is applicable in Reinforcement Learning (RL). In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning [18]. Importantly, our construction results in only a logarithmic dependence of the convergence bound on the size of the state-space.
Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators
We provide finite-sample analysis of a general framework for using k-nearest neighbor statistics to estimate functionals of a nonparametric continuous probability density, including entropies and divergences. Rather than plugging a consistent density estimate (which requires k as the sample size n) into the functional of interest, the estimators we consider fix k and perform a bias correction. This can be more efficient computationally, and, as we show, statistically, leading to faster convergence rates. Our framework unifies several previous estimators, for most of which ours are the first finite sample guarantees.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology (0.54)
- Health & Medicine (0.48)
- Banking & Finance (0.46)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > United States > New York > Erie County > Buffalo (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.63)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.52)
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Reviews: Finite-Sample Analysis for SARSA with Linear Function Approximation
This paper deals with an important problem in theoretical reinforcement learning (RL), that is, finite-time analysis of on-policy RL algorithms such as SARSA. If the analysis techniques, as well as proofs, were correct and concrete, this work may have a broad impact on analyzing related stochastic approximation/RL algorithms. Although important and interesting, the present submission contains several major concerns, that have limited the contributions and even brought into question the practical usefulness of the reported theoretical results. These concerns are listed as follows. To facilitate analysis, a number of the assumptions adopted in this work are strong and impractical.
Reviews: Finite-Sample Analysis for SARSA with Linear Function Approximation
Because the initial reviews were mixed, I obtained an additional review from an expert in the area of this paper. This 4th review came back clearly positive, but in the mean time one of the positive reviewers changed to negative (and later one of the negatives turned to positive). Then we had a lot of discussion, but the reviewers never did agree on how best to view this paper. In fact, they seemed to talk past each other, and in the end we had two positive and two negative reviews. As the area chair, reading the reviews and listening to the discussion, I found the 4th, very-positive review to be the most compelling.