AITopics

2307.09488

Country: Europe > Italy > Piedmont > Turin Province > Turin (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Godichon-Baggioni, Antoine, Werge, Nicklas, Wintenberger, Olivier

Learning from time-dependent streaming data with online stochastic algorithms

arXiv.org Artificial IntelligenceJul-18-2023

This paper addresses stochastic optimization in a streaming setting with time-dependent and biased gradient estimates. We analyze several first-order methods, including Stochastic Gradient Descent (SGD), mini-batch SGD, and time-varying mini-batch SGD, along with their Polyak-Ruppert averages. Our non-asymptotic analysis establishes novel heuristics that link dependence, biases, and convexity levels, enabling accelerated convergence. Specifically, our findings demonstrate that (i) time-varying mini-batch SGD methods have the capability to break long- and short-range dependence structures, (ii) biased SGD methods can achieve comparable performance to their unbiased counterparts, and (iii) incorporating Polyak-Ruppert averaging can accelerate the convergence of the stochastic optimization algorithms. To validate our theoretical findings, we conduct a series of experiments using both simulated and real-life time-dependent data.

artificial intelligence, convergence, machine learning, (18 more...)

2205.12549

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
Europe > Switzerland > Neuchâtel > Neuchâtel (0.04)
(2 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

arXiv.org Artificial IntelligenceJul-18-2023

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

Wei, Ziyang, Zhu, Wanrong, Wu, Wei Biao

Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings. In this paper, we explore a general averaging scheme for SGD. Specifically, we establish the asymptotic normality of a broad range of weighted averaged SGD solutions and provide asymptotically valid online inference approaches. Furthermore, we propose an adaptive averaging scheme that exhibits both optimal statistical rate and favorable non-asymptotic convergence, drawing insights from the optimal weight for the linear model in terms of non-asymptotic mean squared error (MSE).

artificial intelligence, averaging scheme, machine learning, (15 more...)

2307.06915

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Collins, Liam, Hassani, Hamed, Soltanolkotabi, Mahdi, Mokhtari, Aryan, Shakkottai, Sanjay

Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

Feature learning, i.e. extracting meaningful representations of data, is quintessential to the practical success of neural networks trained with gradient descent, yet it is notoriously difficult to explain how and why it occurs. Recent theoretical studies have shown that shallow neural networks optimized on a single task with gradient-based methods can learn meaningful features, extending our understanding beyond the neural tangent kernel or random feature regime in which negligible feature learning occurs. But in practice, neural networks are increasingly often trained on {\em many} tasks simultaneously with differing loss functions, and these prior analyses do not generalize to such settings. In the multi-task learning setting, a variety of studies have shown effective feature learning by simple linear models. However, multi-task learning via {\em nonlinear} models, arguably the most common learning paradigm in practice, remains largely mysterious. In this work, we present the first results proving feature learning occurs in a multi-task setting with a nonlinear model. We show that when the tasks are binary classification problems with labels depending on only $r$ directions within the ambient $d\gg r$-dimensional input space, executing a simple gradient-based multitask learning algorithm on a two-layer ReLU neural network learns the ground-truth $r$ directions. In particular, any downstream task on the $r$ ground-truth coordinates can be solved by learning a linear classifier with sample and neuron complexity independent of the ambient dimension $d$, while a random feature model requires exponential complexity in $d$ for such a guarantee.

artificial intelligence, machine learning, probability, (16 more...)

2307.06887

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.27)
North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Seurin, Paul, Shirvan, Koroush

Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant Fuel Optimization

The nuclear fuel loading pattern optimization problem belongs to the class of large-scale combinatorial optimization. It is also characterized by multiple objectives and constraints, which makes it impossible to solve explicitly. Stochastic optimization methodologies including Genetic Algorithms and Simulated Annealing are used by different nuclear utilities and vendors, but hand-designed solutions continue to be the prevalent method in the industry. To improve the state-of-the-art, Deep Reinforcement Learning (RL), in particular, Proximal Policy Optimization is leveraged. This work presents a first-of-a-kind approach to utilize deep RL to solve the loading pattern problem and could be leveraged for any engineering design optimization. This paper is also to our knowledge the first to propose a study of the behavior of several hyper-parameters that influence the RL algorithm. The algorithm is highly dependent on multiple factors such as the shape of the objective function derived for the core design that behaves as a fudge factor that affects the stability of the learning. But also, an exploration/exploitation trade-off that manifests through different parameters such as the number of loading patterns seen by the agents per episode, the number of samples collected before a policy update nsteps, and an entropy factor ent_coef that increases the randomness of the policy during training. We found that RL must be applied similarly to a Gaussian Process in which the acquisition function is replaced by a parametrized policy. Then, once an initial set of hyper-parameters is found, reducing nsteps and ent_coef until no more learning is observed will result in the highest sample efficiency robustly and stably. This resulted in an economic benefit of 535,000- 642,000 $/year/plant.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2305.05812

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Power Industry > Utilities > Nuclear (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Ioannou, Eirini, Pydi, Muni Sreenivas, Loh, Po-Ling

Robust empirical risk minimization via Newton's method

A new variant of Newton's method for empirical risk minimization is studied, where at each iteration of the optimization algorithm, the gradient and Hessian of the objective function are replaced by robust estimators taken from existing literature on robust mean estimation for multivariate data. After proving a general theorem about the convergence of successive iterates to a small ball around the population-level minimizer, consequences of the theory in generalized linear models are studied when data are generated from Huber's epsilon-contamination model and/or heavytailed distributions. An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed, which may be more appropriate for high-dimensional settings, and conjectures about the convergence of the resulting algorithm are offered. Compared to robust gradient descent, the proposed algorithm enjoys the faster rates of convergence for successive iterates often achieved by second-order algorithms for convex problems, i.e., quadratic convergence in a neighborhood of the optimum, with a stepsize that may be chosen adaptively via backtracking linesearch.

artificial intelligence, inequality, machine learning, (17 more...)

2301.13192

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

A Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models

Arenz, Oleg, Dahlinger, Philipp, Ye, Zihan, Volpp, Michael, Neumann, Gerhard

Variational inference with Gaussian mixture models (GMMs) enables learning of highly tractable yet multi-modal approximations of intractable target distributions with up to a few hundred dimensions. The two currently most effective methods for GMM-based variational inference, VIPS and iBayes-GMM, both employ independent natural gradient updates for the individual components and their weights. We show for the first time, that their derived updates are equivalent, although their practical implementations and theoretical guarantees differ. We identify several design choices that distinguish both approaches, namely with respect to sample selection, natural gradient estimation, stepsize adaptation, and whether trust regions are enforced or the number of components adapted. We argue that for both approaches, the quality of the learned approximations can heavily suffer from the respective design choices: By updating the individual components using samples from the mixture model, iBayes-GMM often fails to produce meaningful updates to low-weight components, and by using a zero-order method for estimating the natural gradient, VIPS scales badly to higher-dimensional problems. Furthermore, we show that information-geometric trust-regions (used by VIPS) are effective even when using first-order natural gradient estimates, and often outperform the improved Bayesian learning rule (iBLR) update used by iBayes-GMM. We systematically evaluate the effects of design choices and show that a hybrid approach significantly outperforms both prior works. Along with this work, we publish our highly modular and efficient implementation for natural gradient variational inference with Gaussian mixture models, which supports 432 different combinations of design choices, facilitates the reproduction of all our experiments, and may prove valuable for the practitioner.

artificial intelligence, experiment, machine learning, (19 more...)

2209.11533

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Stability and Generalization of Stochastic Optimization with Nonconvex and Nonsmooth Problems

Lei, Yunwen

Stochastic optimization has found wide applications in minimizing objective functions in machine learning, which motivates a lot of theoretical studies to understand its practical success. Most of existing studies focus on the convergence of optimization errors, while the generalization analysis of stochastic optimization is much lagging behind. This is especially the case for nonconvex and nonsmooth problems often encountered in practice. In this paper, we initialize a systematic stability and generalization analysis of stochastic optimization on nonconvex and nonsmooth problems. We introduce novel algorithmic stability measures and establish their quantitative connection on the gap between population gradients and empirical gradients, which is then further extended to study the gap between the Moreau envelope of the empirical risk and that of the population risk. To our knowledge, these quantitative connection between stability and generalization in terms of either gradients or Moreau envelopes have not been studied in the literature. We introduce a class of sampling-determined algorithms, for which we develop bounds for three stability measures. Finally, we apply these discussions to derive error bounds for stochastic gradient descent and its adaptive variant, where we show how to achieve an implicit regularization by tuning the step sizes and the number of iterations.

algorithm, artificial intelligence, machine learning, (14 more...)

2206.07082

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Dieuleveut, Aymeric, Fort, Gersende, Moulines, Eric, Wai, Hoi-To

Stochastic Approximation Beyond Gradient for Signal Processing and Machine Learning

arXiv.org Machine LearningJul-16-2023

Stochastic Approximation (SA) is a classical algorithm that has had since the early days a huge impact on signal processing, and nowadays on machine learning, due to the necessity to deal with a large amount of data observed with uncertainties. An exemplar special case of SA pertains to the popular stochastic (sub)gradient algorithm which is the working horse behind many important applications. A lesser-known fact is that the SA scheme also extends to non-stochastic-gradient algorithms such as compressed stochastic gradient, stochastic expectation-maximization, and a number of reinforcement learning algorithms. The aim of this article is to overview and introduce the non-stochastic-gradient perspectives of SA to the signal processing and machine learning audiences through presenting a design guideline of SA algorithms backed by theories. Our central theme is to propose a general framework that unifies existing theories of SA, including its non-asymptotic and asymptotic convergence results, and demonstrate their applications on popular non-stochastic-gradient algorithms. We build our analysis framework based on classes of Lyapunov functions that satisfy a variety of mild conditions. We draw connections between non-stochastic-gradient algorithms and scenarios when the Lyapunov function is smooth, convex, or strongly convex. Using the said framework, we illustrate the convergence properties of the non-stochastic-gradient algorithms using concrete examples. Extensions to the emerging variance reduction techniques for improved sample complexity will also be discussed.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2302.11147

Country:

North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
(4 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

van Dijk, Marten, Nguyen, Phuong Ha

Considerations on the Theory of Training Models with Differential Privacy

arXiv.org Artificial IntelligenceJul-16-2023

Privacy leakage is a big problem in the big-data era. Solving a learning task based on big data intrinsically means that only through a collaborative effort sufficient data is available for training a global model with sufficient clean accuracy (utility). Federated learning is a framework where a learning task is solved by a loose federation of participating devices/clients which are coordinated by a central server [42, 8, 3, 33, 40, 9, 57, 29, 59, 10, 34, 36, 37, 39, 12, 30]. Clients, who use own local data to participate in a learning task by training a global model, want to have privacy guarantees for their local proprietary data. For this reason DP-SGD [1] was introduced as it adapts distributed Stochastic Gradient Descent (SGD)[55] with Differential Privacy (DP)[19, 15, 21, 18].

artificial intelligence, machine learning, privacy, (18 more...)

2303.04676

Country:

North America > United States > Virginia (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)