Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way
Kwon, Jeongyeol, Dotson, Luke, Chen, Yudong, Xie, Qiaomin
–arXiv.org Artificial Intelligence
Previous studies on two-timescale stochastic approximation (SA) mainly focused on bounding mean-squared errors under diminishing stepsize schemes. In this work, we investigate {\it constant} stpesize schemes through the lens of Markov processes, proving that the iterates of both timescales converge to a unique joint stationary distribution in Wasserstein metric. We derive explicit geometric and non-asymptotic convergence rates, as well as the variance and bias introduced by constant stepsizes in the presence of Markovian noise. Specifically, with two constant stepsizes $\alpha < \beta$, we show that the biases scale linearly with both stepsizes as $\Theta(\alpha)+\Theta(\beta)$ up to higher-order terms, while the variance of the slower iterate (resp., faster iterate) scales only with its own stepsize as $O(\alpha)$ (resp., $O(\beta)$). Unlike previous work, our results require no additional assumptions such as $\beta^2 \ll \alpha$ nor extra dependence on dimensions. These fine-grained characterizations allow tail-averaging and extrapolation techniques to reduce variance and bias, improving mean-squared error bound to $O(\beta^4 + \frac{1}{t})$ for both iterates.
arXiv.org Artificial Intelligence
Oct-16-2024
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Wisconsin > Dane County > Madison (0.04)
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.48)