AITopics

2210.06723

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
Europe > Germany > Berlin (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.77)

Woodworth, Blake, Mishchenko, Konstantin, Bach, Francis

Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy

arXiv.org Artificial IntelligenceJun-7-2023

We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy. Our algorithm is based on approximate proximal point iterations on the proxy combined with relatively few stochastic gradients from the objective. When the difference between the objective and the proxy is $\delta$-smooth, our algorithm guarantees convergence at a rate matching stochastic gradient descent on a $\delta$-smooth objective, which can lead to substantially better sample efficiency. Our algorithm has many potential applications in machine learning, and provides a principled means of leveraging synthetic data, physics simulators, mixed public and private data, and more.

algorithm 1, artificial intelligence, machine learning, (13 more...)

2302.03542

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Wang, Qiuhao, Ho, Chin Pang, Petrik, Marek

Policy Gradient in Robust MDPs with Global Convergence Guarantee

arXiv.org Artificial IntelligenceJun-7-2023

Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors. Many successful reinforcement learning algorithms build on variations of policy-gradient methods, but adapting these methods to RMDPs has been challenging. As a result, the applicability of RMDPs to large, practical domains remains limited. This paper proposes a new Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs. In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy in tabular RMDPs. We introduce a novel parametric transition kernel and solve the inner loop robust policy via a gradient-based method. Finally, our numerical results demonstrate the utility of our new algorithm and confirm its global convergence properties.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2212.10439

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New Hampshire (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Andriushchenko, Maksym, Varre, Aditya, Pillaud-Vivien, Loucas, Flammarion, Nicolas

SGD with Large Step Sizes Learns Sparse Features

arXiv.org Artificial IntelligenceJun-7-2023

We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD) in the training of neural networks. We present empirical observations that commonly used large step sizes (i) lead the iterates to jump from one side of a valley to the other causing loss stabilization, and (ii) this stabilization induces a hidden stochastic dynamics orthogonal to the bouncing directions that biases it implicitly toward sparse predictors. Furthermore, we show empirically that the longer large step sizes keep SGD high in the loss landscape valleys, the better the implicit regularization can operate and find sparse representations. Notably, no explicit regularization is used so that the regularization effect comes solely from the SGD training dynamics influenced by the step size schedule. Therefore, these observations unveil how, through the step size schedules, both gradient and noise drive together the SGD dynamics through the loss landscape of neural networks. We justify these findings theoretically through the study of simple neural network models as well as qualitative arguments inspired from stochastic processes. Finally, this analysis allows us to shed a new light on some common practice and observed phenomena when training neural networks. The code of our experiments is available at https://github.com/tml-epfl/sgd-sparse-features.

artificial intelligence, machine learning, step size, (16 more...)

2210.05337

Country: North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Pashakhanloo, Farhad, Koulakov, Alexei

Stochastic Gradient Descent-Induced Drift of Representation in a Two-Layer Neural Network

Representational drift refers to over-time changes in neural activation accompanied by a stable task performance. Despite being observed in the brain and in artificial networks, the mechanisms of drift and its implications are not fully understood. Motivated by recent experimental findings of stimulus-dependent drift in the piriform cortex, we use theory and simulations to study this phenomenon in a two-layer linear feedforward network. Specifically, in a continual online learning scenario, we study the drift induced by the noise inherent in the Stochastic Gradient Descent (SGD). By decomposing the learning dynamics into the normal and tangent spaces of the minimum-loss manifold, we show the former corresponds to a finite variance fluctuation, while the latter could be considered as an effective diffusion process on the manifold. We analytically compute the fluctuation and the diffusion coefficients for the stimuli representations in the hidden layer as functions of network parameters and input distribution. Further, consistent with experiments, we show that the drift rate is slower for a more frequently presented stimulus. Overall, our analysis yields a theoretical framework for better understanding of the drift phenomenon in biological and artificial neural networks.

artificial intelligence, machine learning, manifold, (17 more...)

2302.02563

Country: North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Understanding Generalization of Federated Learning via Stability: Heterogeneity Matters

Sun, Zhenyu, Niu, Xiaochun, Wei, Ermin

Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications. Good generalization indicates the model can predict unseen data correctly when trained under a limited number of data. Federated learning (FL), which has emerged as a popular distributed learning framework, allows multiple devices or clients to train a shared model without violating privacy requirements. While the existing literature has studied extensively the generalization performances of centralized machine learning algorithms, similar analysis in the federated settings is either absent or with very restrictive assumptions on the loss functions. In this paper, we aim to analyze the generalization performances of federated learning by means of algorithmic stability, which measures the change of the output model of an algorithm when perturbing one data point. Three widely-used algorithms are studied, including FedAvg, SCAFFOLD, and FedProx, under convex and non-convex loss functions. Our analysis shows that the generalization performances of models trained by these three algorithms are closely related to the heterogeneity of clients' datasets as well as the convergence behaviors of the algorithms. Particularly, in the i.i.d. setting, our results recover the classical results of stochastic gradient descent (SGD).

algorithm, artificial intelligence, machine learning, (16 more...)

2306.03824

Country: North America > United States > Virginia (0.04)

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Adhikari, Shishir, Kabakçıoğlu, Alkan, Strang, Alexander, Yuret, Deniz, Hinczewski, Michael

Machine learning in and out of equilibrium

The algorithms used to train neural networks, like stochastic gradient descent (SGD), have close parallels to natural processes that navigate a high-dimensional parameter space -- for example protein folding or evolution. Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels in a single, unified framework. We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium, exhibiting persistent currents in the space of network parameters. As in its physical analogues, the current is associated with an entropy production rate for any given training trajectory. The stationary distribution of these rates obeys the integral and detailed fluctuation theorems -- nonequilibrium generalizations of the second law of thermodynamics. We validate these relations in two numerical examples, a nonlinear regression network and MNIST digit classification. While the fluctuation theorems are universal, there are other aspects of the stationary state that are highly sensitive to the training details. Surprisingly, the effective loss landscape and diffusion matrix that determine the shape of the stationary distribution vary depending on the simple choice of minibatching done with or without replacement. We can take advantage of this nonequilibrium sensitivity to engineer an equilibrium stationary state for a particular application: sampling from a posterior distribution of network weights in Bayesian machine learning. We propose a new variation of stochastic gradient Langevin dynamics (SGLD) that harnesses without replacement minibatching. In an example system where the posterior is exactly known, this SGWORLD algorithm outperforms SGLD, converging to the posterior orders of magnitude faster as a function of the learning rate.

artificial intelligence, machine learning, matrix, (19 more...)

2306.03521

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Hurault, Samuel, Kamilov, Ulugbek, Leclaire, Arthur, Papadakis, Nicolas

Convergent Bregman Plug-and-Play Image Restoration for Poisson Inverse Problems

Plug-and-Play (PnP) methods are efficient iterative algorithms for solving ill-posed image inverse problems. PnP methods are obtained by using deep Gaussian denoisers instead of the proximal operator or the gradient-descent step within proximal algorithms. Current PnP schemes rely on data-fidelity terms that have either Lipschitz gradients or closed-form proximal operators, which is not applicable to Poisson inverse problems. Based on the observation that the Gaussian noise is not the adequate noise model in this setting, we propose to generalize PnP using theBregman Proximal Gradient (BPG) method. BPG replaces the Euclidean distance with a Bregman divergence that can better capture the smoothness properties of the problem. We introduce the Bregman Score Denoiser specifically parametrized and trained for the new Bregman geometry and prove that it corresponds to the proximal operator of a nonconvex potential. We propose two PnP algorithms based on the Bregman Score Denoiser for solving Poisson inverse problems. Extending the convergence results of BPG in the nonconvex settings, we show that the proposed methods converge, targeting stationary points of an explicit global functional. Experimental evaluations conducted on various Poisson inverse problems validate the convergence results and showcase effective restoration performance.

algorithm, artificial intelligence, machine learning, (13 more...)

2306.03466

Country: Europe > Italy (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Provable Multi-instance Deep AUC Maximization with Stochastic Pooling

Zhu, Dixian, Wang, Bokun, Chen, Zhi, Wang, Yaxing, Sonka, Milan, Wu, Xiaodong, Yang, Tianbao

artificial intelligence, machine learning, provable multi-instance deep auc maximization, (13 more...)

This paper considers a novel application of deep AUC maximization (DAM) for multi-instance learning (MIL), in which a single class label is assigned to a bag of instances (e.g., multiple 2D slices of a CT scan for a patient). We address a neglected yet non-negligible computational challenge of MIL in the context of DAM, i.e., bag size is too large to be loaded into {GPU} memory for backpropagation, which is required by the standard pooling methods of MIL. To tackle this challenge, we propose variance-reduced stochastic pooling methods in the spirit of stochastic optimization by formulating the loss function over the pooled prediction as a multi-level compositional function. By synthesizing techniques from stochastic compositional optimization and non-convex min-max optimization, we propose a unified and provable muli-instance DAM (MIDAM) algorithm with stochastic smoothed-max pooling or stochastic attention-based pooling, which only samples a few instances for each bag to compute a stochastic gradient estimator and to update the model parameter. We establish a similar convergence rate of the proposed MIDAM algorithm as the state-of-the-art DAM algorithms. Our extensive experiments on conventional MIL datasets and medical datasets demonstrate the superiority of our MIDAM algorithm.

2305.0804

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Texas > Brazos County > College Station (0.14)
Asia > China > Beijing > Beijing (0.04)
(3 more...)

Genre:

Research Report (0.50)
Overview (0.48)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

Sun, Zhenyu, Wei, Ermin

In this paper, we study a large-scale multi-agent minimax optimization problem, which models many interesting applications in statistical learning and game theory, including Generative Adversarial Networks (GANs). The overall objective is a sum of agents' private local objective functions. We first analyze an important special case, empirical minimax problem, where the overall objective approximates a true population minimax risk by statistical samples. We provide generalization bounds for learning with this objective through Rademacher complexity analysis. Then, we focus on the federated setting, where agents can perform local computation and communicate with a central server. Most existing federated minimax algorithms either require communication per iteration or lack performance guarantees with the exception of Local Stochastic Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent algorithm which guarantees convergence under a diminishing stepsize. By analyzing Local SGDA under the ideal condition of no gradient noise, we show that generally it cannot guarantee exact convergence with constant stepsizes and thus suffers from slow rates of convergence. To tackle this issue, we propose FedGDA-GT, an improved Federated (Fed) Gradient Descent Ascent (GDA) method based on Gradient Tracking (GT). When local objectives are Lipschitz smooth and strongly-convex-strongly-concave, we prove that FedGDA-GT converges linearly with a constant stepsize to global $\epsilon$-approximation solution with $\mathcal{O}(\log (1/\epsilon))$ rounds of communication, which matches the time complexity of centralized GDA method. Finally, we numerically show that FedGDA-GT outperforms Local SGDA.

artificial intelligence, local sgda, machine learning, (13 more...)

2206.01132

Country:

North America > United States > Illinois > Cook County > Evanston (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)