minkowski
Improved Central Limit Theorem and Bootstrap Approximations for Linear Stochastic Approximation
Butyrin, Bogdan, Moulines, Eric, Naumov, Alexey, Samsonov, Sergey, Shao, Qi-Man, Zhang, Zhuo-Song
In this paper, we refine the Berry-Esseen bounds for the multivariate normal approximation of Polyak-Ruppert averaged iterates arising from the linear stochastic approximation (LSA) algorithm with decreasing step size. We consider the normal approximation by the Gaussian distribution with covariance matrix predicted by the Polyak-Juditsky central limit theorem and establish the rate up to order $n^{-1/3}$ in convex distance, where $n$ is the number of samples used in the algorithm. We also prove a non-asymptotic validity of the multiplier bootstrap procedure for approximating the distribution of the rescaled error of the averaged LSA estimator. We establish approximation rates of order up to $1/\sqrt{n}$ for the latter distribution, which significantly improves upon the previous results obtained by Samsonov et al. (2024).
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- (6 more...)
Gaussian Approximation for Two-Timescale Linear Stochastic Approximation
Butyrin, Bogdan, Rubtsov, Artemy, Naumov, Alexey, Ulyanov, Vladimir, Samsonov, Sergey
In this paper, we establish non-asymptotic bounds for accuracy of normal approximation for linear two-timescale stochastic approximation (TTSA) algorithms driven by martingale difference or Markov noise. Focusing on both the last iterate and Polyak-Ruppert averaging regimes, we derive bounds for normal approximation in terms of the convex distance between probability distributions. Our analysis reveals a non-trivial interaction between the fast and slow timescales: the normal approximation rate for the last iterate improves as the timescale separation increases, while it decreases in the Polyak-Ruppert averaged setting. We also provide the high-order moment bounds for the error of linear TTSA algorithm, which may be of independent interest.
- Asia > Middle East > Jordan (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
High-Order Error Bounds for Markovian LSA with Richardson-Romberg Extrapolation
Levin, Ilya, Naumov, Alexey, Samsonov, Sergey
In this paper, we study the bias and high-order error bounds of the Linear Stochastic Approximation (LSA) algorithm with Polyak-Ruppert (PR) averaging under Markovian noise. We focus on the version of the algorithm with constant step size $α$ and propose a novel decomposition of the bias via a linearization technique. We analyze the structure of the bias and show that the leading-order term is linear in $α$ and cannot be eliminated by PR averaging. To address this, we apply the Richardson-Romberg (RR) extrapolation procedure, which effectively cancels the leading bias term. We derive high-order moment bounds for the RR iterates and show that the leading error term aligns with the asymptotically optimal covariance matrix of the vanilla averaged LSA iterates.
A note on concentration inequalities for the overlapped batch mean variance estimators for Markov chains
Moulines, Eric, Naumov, Alexey, Samsonov, Sergey
In this paper, we study the concentration properties of quadratic forms associated with Markov chains using the martingale decomposition method introduced by Atchadé and Cattaneo (2014). In particular, we derive concentration inequalities for the overlapped batch mean (OBM) estimators of the asymptotic variance for uniformly geometrically ergodic Markov chains. Our main result provides an explicit control of the $p$-th moment of the difference between the OBM estimator and the asymptotic variance of the Markov chain with explicit dependence upon $p$ and mixing time of the underlying Markov chain.
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (4 more...)
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation
Sheshukova, Marina, Belomestny, Denis, Durmus, Alain, Moulines, Eric, Naumov, Alexey, Samsonov, Sergey
We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation technique to reduce the asymptotic bias of SGD at the expense of a mild increase of the variance. We significantly extend previous results by providing an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations $n$. More precisely, we show that the mean-squared error can be decomposed into the sum of two terms: a leading one of order $\mathcal{O}(n^{-1/2})$ with explicit dependence on a minimax-optimal asymptotic covariance matrix, and a second-order term of order $\mathcal{O}(n^{-3/4})$ where the power $3/4$ can not be improved in general. We also extend this result to the $p$-th moment bound keeping optimal scaling of the remainders with respect to $n$. Our analysis relies on the properties of the SGD iterates viewed as a time-homogeneous Markov chain. In particular, we establish that this chain is geometrically ergodic with respect to a suitably defined weighted Wasserstein semimetric.
- Europe > France (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
Rosenthal-type inequalities for linear statistics of Markov chains
Durmus, Alain, Moulines, Eric, Naumov, Alexey, Samsonov, Sergey, Sheshukova, Marina
Probability and moment inequalities for sums of random variables are of paramount importance in the complexity analysis of numerous stochastic approximation algorithms or finite-time analysis of Monte Carlo estimators; see [20], [10], and references therein. The main focus in this area has been on concentration inequalities for independent random variable sums or martingale difference sequences; see e.g. in [4, 36]. However, the study of concentration inequalities for additive Markov chain functions is still relatively underdeveloped. For the technically simple case of uniformly ergodic Markov chains, there is extensive work on Hoeffding-and Bernstein-like inequalities as found in [23, 34, 20, 38]. Nevertheless, the application of these results may be difficult due to a lack of quantitative data or the substitution of asymptotic variance of the chain by surrogates; see Section 2.1 for relevant definitions. The present work aims to fill this gap by extending Rosenthal-and Bernstein-type inequalities to Markov chains which converge geometrically fast to a unique invariant distribution, with an explicit emphasis on the mixing time of the underlying Markov chain. An important tool for establishing deviation bounds for sums of random variables is based on moment inequalities.
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation
Durmus, Alain, Moulines, Eric, Naumov, Alexey, Samsonov, Sergey
The LSA algorithm is central in statistics, machine learning, and linear systems identification, see e.g. the works Eweda and Macchi (1983); Widrow and Stearns (1985); Benveniste et al. (2012); Kushner and Yin (2003) and references therein. More recently, it has sparked a renewed interest in machine learning, especially for high-dimensional least squares and reinforcement learning (RL) problems; Bertsekas and Tsitsiklis (2003); Bottou et al. (2018); Sutton (1988); Bertsekas (2019); Watkins and Dayan (1992). The LSA and LSA-PR recursions (1) have been the subject of a wealth of work, and it is difficult to adequately acknowledge all contributions. Polyak and Juditsky (1992); Kushner and Yin (2003); Borkar (2008); Benveniste et al. (2012) provided asymptotic convergence guarantees (almost sure convergence, central limit theorem) under both i.i.d. and Markovian noise settings. In particular, it has been established that LSA-PR can accelerate LSA and satisfies a central limit theorem with an asymptotically minimax-optimal covariance matrix. Although asymptotic convergence analysis is of theoretical interest, the current trend is to obtain nonasymptotic guarantees that take into account both the limited sample size and the dimension of the parameter space. For these reasons, non-asymptotic analysis of both i.i.d. and Markovian SA procedures has recently attracted much attention.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- (3 more...)
On learning parametric distributions from quantized samples
Sarbu, Septimia, Zaidi, Abdellatif
We consider the problem of learning parametric distributions from their quantized samples in a network. Specifically, $n$ agents or sensors observe independent samples of an unknown parametric distribution; and each of them uses $k$ bits to describe its observed sample to a central processor whose goal is to estimate the unknown distribution. First, we establish a generalization of the well-known van Trees inequality to general $L_p$-norms, with $p > 1$, in terms of Generalized Fisher information. Then, we develop minimax lower bounds on the estimation error for two losses: general $L_p$-norms and the related Wasserstein loss from optimal transport.
- North America (0.14)
- Europe > France (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
8 Machine Learning Algorithms in Python - You Must Learn - DataFlair
Previously, we discussed the techniques of machine learning with Python. Going deeper, today, we will learn and implement 8 top Machine Learning Algorithms in Python. Let's begin the journey of Machine Learning Algorithms in Python Programming. Linear regression is one of the supervised Machine learning algorithms in Python that observes continuous features and predicts an outcome. Depending on whether it runs on a single variable or on many features, we can call it simple linear regression or multiple linear regression.
The statistical Minkowski distances: Closed-form formula for Gaussian Mixture Models
The traditional Minkowski distances are induced by the corresponding Minkowski norms in real-valued vector spaces. In this work, we propose novel statistical symmetric distances based on the Minkowski's inequality for probability densities belonging to Lebesgue spaces. These statistical Minkowski distances admit closed-form formula for Gaussian mixture models when parameterized by integer exponents. This result extends to arbitrary mixtures of exponential families with natural parameter spaces being cones: This includes the binomial, the multinomial, the zero-centered Laplacian, the Gaussian and the Wishart mixtures, among others. We also derive a Minkowski's diversity index of a normalized weighted set of probability distributions from Minkowski's inequality.