AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Kinetic Interacting Particle Langevin Monte Carlo

Oliva, Paul Felix Valsecchi, Akyildiz, O. Deniz

arXiv.org Machine LearningJul-8-2024

This paper introduces and analyses interacting underdamped Langevin algorithms, termed Kinetic Interacting Particle Langevin Monte Carlo (KIPLMC) methods, for statistical inference in latent variable models. We propose a diffusion process that evolves jointly in the space of parameters and latent variables and exploit the fact that the stationary distribution of this diffusion concentrates around the maximum marginal likelihood estimate of the parameters. We then provide two explicit discretisations of this diffusion as practical algorithms to estimate parameters of statistical models. For each algorithm, we obtain nonasymptotic rates of convergence for the case where the joint log-likelihood is strongly concave with respect to latent variables and parameters. In particular, we provide convergence analysis for the diffusion together with the discretisation error, providing convergence rate estimates for the algorithms in Wasserstein-2 distance. To demonstrate the utility of the introduced methodology, we provide numerical experiments that demonstrate the effectiveness of the proposed diffusion for statistical inference and the stability of the numerical integrators utilised for discretisation. Our setting covers a broad number of applications, including unsupervised learning, statistical inference, and inverse problems.

algorithm, diffusion, stationary measure, (17 more...)

arXiv.org Machine Learning

2407.0579

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Wisconsin (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Sequential Gaussian Variational Inference for Nonlinear State Estimation applied to Robotic Applications

Seo, Min-Won, Kia, Solmaz S.

arXiv.org Artificial IntelligenceJul-7-2024

Probabilistic state estimation is essential for robots navigating uncertain environments. Accurately and efficiently managing uncertainty in estimated states is key to robust robotic operation. However, nonlinearities in robotic platforms pose significant challenges that require advanced estimation techniques. Gaussian variational inference (GVI) offers an optimization perspective on the estimation problem, providing analytically tractable solutions and efficiencies derived from the geometry of Gaussian space. We propose a Sequential Gaussian Variational Inference (S-GVI) method to address nonlinearity and provide efficient sequential inference processes. Our approach integrates sequential Bayesian principles into the GVI framework, which are addressed using statistical approximations and gradient updates on the information geometry. Validations through simulations and real-world experiments demonstrate significant improvements in state estimation over the Maximum A Posteriori (MAP) estimation method.

estimation, s-gvi, state estimation, (12 more...)

arXiv.org Artificial Intelligence

2407.05478

Country:

North America > United States > California > Orange County > Irvine (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(2 more...)

Add feedback

Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations

Pan, Xiaokang, Li, Xingyu, Liu, Jin, Sun, Tao, Sun, Kai, Chen, Lixing, Qu, Zhe

arXiv.org Artificial IntelligenceJul-7-2024

STOchastic Recursive Momentum (STORM)-based algorithms have been widely developed to solve one to $K$-level ($K \geq 3$) stochastic optimization problems. Specifically, they use estimators to mitigate the biased gradient issue and achieve near-optimal convergence results. However, there is relatively little work on understanding their generalization performance, particularly evident during the transition from one to $K$-level optimization contexts. This paper provides a comprehensive generalization analysis of three representative STORM-based algorithms: STORM, COVER, and SVMR, for one, two, and $K$-level stochastic optimizations under both convex and strongly convex settings based on algorithmic stability. Firstly, we define stability for $K$-level optimizations and link it to generalization. Then, we detail the stability results for three prominent STORM-based algorithms. Finally, we derive their excess risk bounds by balancing stability results with optimization errors. Our theoretical results provide strong evidence to complete STORM-based algorithms: (1) Each estimator may decrease their stability due to variance with its estimation target. (2) Every additional level might escalate the generalization error, influenced by the stability and the variance between its cumulative stochastic gradient and the true gradient. (3) Increasing the batch size for the initial computation of estimators presents a favorable trade-off, enhancing the generalization performance.

generalization, optimization, stability and generalization, (11 more...)

arXiv.org Artificial Intelligence

2407.05286

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Semi-adaptive Synergetic Two-way Pseudoinverse Learning System

Liu, Binghong, Zhao, Ziqi, Li, Shupan, Wang, Ke

arXiv.org Artificial IntelligenceJul-6-2024

Deep learning has become a crucial technology for making breakthroughs in many fields. Nevertheless, it still faces two important challenges in theoretical and applied aspects. The first lies in the shortcomings of gradient descent based learning schemes which are time-consuming and difficult to determine the learning control hyperparameters. Next, the architectural design of the model is usually tricky. In this paper, we propose a semi-adaptive synergetic two-way pseudoinverse learning system, wherein each subsystem encompasses forward learning, backward learning, and feature concatenation modules. The whole system is trained using a non-gradient descent learning algorithm. It simplifies the hyperparameter tuning while improving the training efficiency. The architecture of the subsystems is designed using a data-driven approach that enables automated determination of the depth of the subsystems. We compare our method with the baselines of mainstream non-gradient descent based methods and the results demonstrate the effectiveness of our proposed method. The source code for this paper is available at http://github.com/B-berrypie/Semi-adaptive-Synergetic-Two-way-Pseudoinverse-Learning-System}{http://github.com/B-berrypie/Semi-adaptive-Synergetic-Two-way-Pseudoinverse-Learning-System.

algorithm, neural network, subsystem, (14 more...)

arXiv.org Artificial Intelligence

2406.18931

Country:

Asia > China > Henan Province > Zhengzhou (0.05)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Privacy of the last iterate in cyclically-sampled DP-SGD on nonconvex composite losses

Kong, Weiwei, Ribero, Mónica

arXiv.org Machine LearningJul-6-2024

Differentially private stochastic gradient descent (DP-SGD) refers to a family of optimization algorithms that provide a guaranteed level of differential privacy (DP) through DP accounting techniques. However, current accounting techniques make assumptions that diverge significantly from practical DP-SGD implementations. For example, they may assume the loss function is Lipschitz continuous and convex, sample the batches randomly with replacement, or omit the gradient clipping step. In this work, we analyze the most commonly used variant of DP-SGD, in which we sample batches cyclically with replacement, perform gradient clipping, and only release the last DP-SGD iterate. More specifically - without assuming convexity, smoothness, or Lipschitz continuity of the loss function - we establish new R\'enyi differential privacy (RDP) bounds for the last DP-SGD iterate under the mild assumption that (i) the DP-SGD stepsize is small relative to the topological constants in the loss function, and (ii) the loss function is weakly-convex. Moreover, we show that our bounds converge to previously established convex bounds when the weak-convexity parameter of the objective function approaches zero. In the case of non-Lipschitz smooth loss functions, we provide a weaker bound that scales well in terms of the number of DP-SGD iterations.

assumption, nyi divergence, proposition 2, (16 more...)

arXiv.org Machine Learning

2407.05237

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes

Orvieto, Antonio, Xiao, Lin

arXiv.org Artificial IntelligenceJul-5-2024

We consider the problem of minimizing the average of a large number of smooth but possibly non-convex functions. In the context of most machine learning applications, each loss function is non-negative and thus can be expressed as the composition of a square and its real-valued square root. This reformulation allows us to apply the Gauss-Newton method, or the Levenberg-Marquardt method when adding a quadratic regularization. The resulting algorithm, while being computationally as efficient as the vanilla stochastic gradient method, is highly adaptive and can automatically warmup and decay the effective stepsize while tracking the non-negative loss landscape. We provide a tight convergence analysis, leveraging new techniques, in the stochastic convex and non-convex settings. In particular, in the convex case, the method does not require access to the gradient Lipshitz constant for convergence, and is guaranteed to never diverge. The convergence rates and empirical evaluations compare favorably to the classical (stochastic) gradient method as well as to several other adaptive methods.

hyperparameter, ngn, stepsize, (17 more...)

arXiv.org Artificial Intelligence

2407.04358

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Langevin Dynamics: A Unified Perspective on Optimization via Lyapunov Potentials

Chen, August Y., Sekhari, Ayush, Sridharan, Karthik

arXiv.org Artificial IntelligenceJul-5-2024

We study the problem of non-convex optimization using Stochastic Gradient Langevin Dynamics (SGLD). SGLD is a natural and popular variation of stochastic gradient descent where at each step, appropriately scaled Gaussian noise is added. To our knowledge, the only strategy for showing global convergence of SGLD on the loss function is to show that SGLD can sample from a stationary distribution which assigns larger mass when the function is small (the Gibbs measure), and then to convert these guarantees to optimization results. We employ a new strategy to analyze the convergence of SGLD to global minima, based on Lyapunov potentials and optimization. We convert the same mild conditions from previous works on SGLD into geometric properties based on Lyapunov potentials. This adapts well to the case with a stochastic gradient oracle, which is natural for machine learning applications where one wants to minimize population loss but only has access to stochastic gradients via minibatch training samples. Here we provide 1) improved rates in the setting of previous works studying SGLD for optimization, 2) the first finite gradient complexity guarantee for SGLD where the function is Lipschitz and the Gibbs measure defined by the function satisfies a Poincar\'e Inequality, and 3) prove if continuous-time Langevin Dynamics succeeds for optimization, then discrete-time SGLD succeeds under mild regularity assumptions.

assumption 2, lemma 6, probability, (14 more...)

arXiv.org Artificial Intelligence

2407.04264

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
(3 more...)

Genre:

Research Report (0.50)
Workflow (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Geometrically Inspired Kernel Machines for Collaborative Learning Beyond Gradient Descent

Kumar, Mohit, Valentinitsch, Alexander, Fuchs, Magdalena, Brucker, Mathias, Bowles, Juliana, Husakovic, Adnan, Abbas, Ali, Moser, Bernhard A.

arXiv.org Artificial IntelligenceJul-5-2024

This paper develops a novel mathematical framework for collaborative learning by means of geometrically inspired kernel machines which includes statements on the bounds of generalisation and approximation errors, and sample complexity. For classification problems, this approach allows us to learn bounded geometric structures around given data points and hence solve the global model learning problem in an efficient way by exploiting convexity properties of the related optimisation problem in a Reproducing Kernel Hilbert Space (RKHS). In this way, we can reduce classification problems to determining the closest bounded geometric structure from a given data point. Further advantages that come with our solution is that our approach does not require clients to perform multiple epochs of local optimisation using stochastic gradient descent, nor require rounds of communication between client/server for optimising the global model. We highlight that numerous experiments have shown that the proposed method is a competitive alternative to the state-of-the-art.

experiment, kahm global classifier, learning, (14 more...)

arXiv.org Artificial Intelligence

2407.04335

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Europe > Austria > Upper Austria > Linz (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Add feedback

Differentially Private Convex Approximation of Two-Layer ReLU Networks

Koskela, Antti

arXiv.org Artificial IntelligenceJul-5-2024

We show that it is possible to privately train convex problems that give models with similar privacy-utility trade-off as one hidden-layer ReLU networks trained with differentially private stochastic gradient descent (DP-SGD). As we show, this is possible via a certain dual formulation of the ReLU minimization problem. We derive a stochastic approximation of the dual problem that leads to a strongly convex problem which allows applying, for example, the privacy amplification by iteration type of analysis for gradient-based private optimizers, and in particular allows giving accurate privacy bounds for the noisy cyclic mini-batch gradient descent with fixed disjoint mini-batches. We obtain on the MNIST and FashionMNIST problems for the noisy cyclic mini-batch gradient descent first empirical results that show similar privacy-utility-trade-offs as DP-SGD applied to a ReLU network. We outline theoretical utility bounds that illustrate the speed-ups of the private convex approximation of ReLU networks.

dp-sgd, relu minimization problem, relu network, (13 more...)

arXiv.org Artificial Intelligence

2407.04884

Country: North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Add feedback

Function Smoothing Regularization for Precision Factorization Machine Annealing in Continuous Variable Optimization Problems

Endo, Katsuhiro, Takahashi, Kazuaki Z.

arXiv.org Artificial IntelligenceJul-5-2024

Solving continuous variable optimization problems by factorization machine quantum annealing (FMQA) demonstrates the potential of Ising machines to be extended as a solver for integer and real optimization problems. However, the details of the Hamiltonian function surface obtained by factorization machine (FM) have been overlooked. This study shows that in the widely common case where real numbers are represented by a combination of binary variables, the function surface of the Hamiltonian obtained by FM can be very noisy. This noise interferes with the inherent capabilities of quantum annealing and is likely to be a substantial cause of problems previously considered unsolvable due to the limitations of FMQA performance. The origin of the noise is identified and a simple, general method is proposed to prevent its occurrence. The generalization performance of the proposed method and its ability to solve practical problems is demonstrated.

evaluation point, function surface, hamiltonian, (17 more...)

arXiv.org Artificial Intelligence

2407.04393

Country: Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback