AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Mixed Optimization for Smooth Functions

Neural Information Processing SystemsMar-14-2024, 00:10:36 GMT

It is well known that the optimal convergence rate for stochastic optimization of smooth functions is O(1/ T), which is same as stochastic optimization of Lipschitz continuous convex functions.

convergence rate, optimization, stochastic optimization, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.32)

Add feedback

Dissipative Gradient Descent Ascent Method: A Control Theory Inspired Algorithm for Min-max Optimization

Zheng, Tianqi, Loizou, Nicolas, You, Pengcheng, Mallada, Enrique

arXiv.org Artificial IntelligenceMar-14-2024

Gradient Descent Ascent (GDA) methods for min-max optimization problems typically produce oscillatory behavior that can lead to instability, e.g., in bilinear settings. To address this problem, we introduce a dissipation term into the GDA updates to dampen these oscillations. The proposed Dissipative GDA (DGDA) method can be seen as performing standard GDA on a state-augmented and regularized saddle function that does not strictly introduce additional convexity/concavity. We theoretically show the linear convergence of DGDA in the bilinear and strongly convex-strongly concave settings and assess its performance by comparing DGDA with other methods such as GDA, Extra-Gradient (EG), and Optimistic GDA. Our findings demonstrate that DGDA surpasses these methods, achieving superior convergence rates. We support our claims with two numerical examples that showcase DGDA's effectiveness in solving saddle point problems.

algorithm, convergence, dgda, (12 more...)

arXiv.org Artificial Intelligence

2403.0909

Country:

North America > United States > Maryland > Baltimore (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Add feedback

ADEdgeDrop: Adversarial Edge Dropping for Robust Graph Neural Networks

Chen, Zhaoliang, Wu, Zhihao, Sadikaj, Ylli, Plant, Claudia, Dai, Hong-Ning, Wang, Shiping, Guo, Wenzhong

arXiv.org Artificial IntelligenceMar-14-2024

Although Graph Neural Networks (GNNs) have exhibited the powerful ability to gather graph-structured information from neighborhood nodes via various message-passing mechanisms, the performance of GNNs is limited by poor generalization and fragile robustness caused by noisy and redundant graph data. As a prominent solution, Graph Augmentation Learning (GAL) has recently received increasing attention. Among prior GAL approaches, edge-dropping methods that randomly remove edges from a graph during training are effective techniques to improve the robustness of GNNs. However, randomly dropping edges often results in bypassing critical edges, consequently weakening the effectiveness of message passing. In this paper, we propose a novel adversarial edge-dropping method (ADEdgeDrop) that leverages an adversarial edge predictor guiding the removal of edges, which can be flexibly incorporated into diverse GNN backbones. Employing an adversarial training framework, the edge predictor utilizes the line graph transformed from the original graph to estimate the edges to be dropped, which improves the interpretability of the edge-dropping method. The proposed ADEdgeDrop is optimized alternately by stochastic gradient descent and projected gradient descent. Comprehensive experiments on six graph benchmark datasets demonstrate that the proposed ADEdgeDrop outperforms state-of-the-art baselines across various GNN backbones, demonstrating improved generalization and robustness.

adedgedrop, edge predictor, graph, (15 more...)

arXiv.org Artificial Intelligence

2403.09171

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Fujian Province > Fuzhou (0.05)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

FedComLoc: Communication-Efficient Distributed Training of Sparse and Quantized Models

Yi, Kai, Meinhardt, Georg, Condat, Laurent, Richtárik, Peter

arXiv.org Artificial IntelligenceMar-14-2024

Federated Learning (FL) has garnered increasing attention due to its unique characteristic of allowing heterogeneous clients to process their private data locally and interact with a central server, while being respectful of privacy. A critical bottleneck in FL is the communication cost. A pivotal strategy to mitigate this burden is \emph{Local Training}, which involves running multiple local stochastic gradient descent iterations between communication phases. Our work is inspired by the innovative \emph{Scaffnew} algorithm, which has considerably advanced the reduction of communication complexity in FL. We introduce FedComLoc (Federated Compressed and Local Training), integrating practical and effective compression into \emph{Scaffnew} to further enhance communication efficiency. Extensive experiments, using the popular TopK compressor and quantization, demonstrate its prowess in substantially reducing communication overheads in heterogeneous settings.

communication-efficient, compression, fedcomloc, (14 more...)

arXiv.org Artificial Intelligence

2403.09904

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Saudi Arabia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Projected Natural Actor-Critic

Neural Information Processing SystemsMar-13-2024, 21:24:22 GMT

In this paper we address a drawback of natural actor-critics that limits their real-world applicability--their lack of safety guarantees. We present a principled algorithm for performing natural gradient descent over a constrained domain. In the context of reinforcement learning, this allows for natural actor-critic algorithms that are guaranteed to remain within a known safe region of policy space. While deriving our class of constrained natural actor-critic algorithms, which we call Projected Natural Actor-Critics (PNACs), we also elucidate the relationship between natural gradient descent and mirror descent.

algorithm, descent, projection, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York (0.04)
(2 more...)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback

Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

Neural Information Processing SystemsMar-13-2024, 21:24:04 GMT

We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We make a progress along the line by presenting a distributed stochastic dual coordinate ascent algorithm in a star network, with an analysis of the tradeoff between computation and communication. We verify our analysis by experiments on real data sets. Moreover, we compare the proposed algorithm with distributed stochastic gradient descent methods and distributed alternating direction methods of multipliers for optimizing SVMs in the same distributed framework, and observe competitive performances.

algorithm, practical variant, variant, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Cupertino (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)

Add feedback

d93ed5b6db83be78efb0d05ae420158e-Reviews.html

Neural Information Processing SystemsMar-13-2024, 20:33:20 GMT

This paper proposes determinantal point processes as a method to model inhibitory interactions in spike train data. The authors present a maximum likelihood approach based on stochastic gradient descent. This is an interesting idea that could potentially be a powerful non-factorial spike train model. However, the presentation here seems a bit unfocused, and it's not obvious to what extent DPPs will actually improve model accuracy. Since gain and periodic terms can easily be built into GLMs, it wasn't obvious to me that framing the problem as a DPP was worth the trouble.

coupling, dpp approach, interaction, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

Add feedback

c913303f392ffc643f7240b180602652-Reviews.html

Neural Information Processing SystemsMar-13-2024, 20:17:03 GMT

Convergence properties and relation to stochastic gradient descent have been discussed. Overall, this paper is not well written and should not be accepted in the current form. All theoretical analyses are based on the potential functions defined in line 109-111. There is no citation for this potential function and it is not explained why this function is defined in this form. Using the norm of the difference between v_n and the optimal first principal component is a more standard way to analyse the convergence rate.

analyse, principal component

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

Add feedback

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

Neural Information Processing SystemsMar-13-2024, 19:36:31 GMT

Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG). For smooth and strongly convex functions, we prove that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG). However, our analysis is significantly simpler and more intuitive. Moreover, unlike SDCA or SAG, our method does not require the storage of gradients, and thus is more easily applicable to complex problems such as some structured prediction problems and neural network learning.

sgd, svrg, variance, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Variance Reduction for Stochastic Gradient Optimization

Neural Information Processing SystemsMar-13-2024, 18:51:24 GMT

Stochastic gradient optimization is a class of widely used algorithms for training machine learning models. To optimize an objective, it uses the noisy gradient computed from the random data samples instead of the true gradient computed from the entire dataset. However, when the variance of the noisy gradient is large, the algorithm might spend much time bouncing around, leading to slower convergence and worse performance. In this paper, we develop a general approach of using control variate for variance reduction in stochastic gradient. Data statistics such as low-order moments (pre-computed or estimated online) is used to form the control variate. We demonstrate how to construct the control variate for two practical problems using stochastic gradient optimization. One is convex--the MAP estimation for logistic regression, and the other is non-convex--stochastic variational inference for latent Dirichlet allocation. On both problems, our approach shows faster convergence and better performance than the classical approach.

algorithm, control variate, gradient, (10 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(3 more...)

Genre: Research Report > New Finding (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback