AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Stochastic Gradient Descent with Adaptive Data

arXiv.org Artificial IntelligenceOct-1-2024

Stochastic gradient descent (SGD) is a powerful optimization technique that is particularly useful in online learning scenarios. Its convergence analysis is relatively well understood under the assumption that the data samples are independent and identically distributed (iid). However, applying SGD to policy optimization problems in operations research involves a distinct challenge: the policy changes the environment and thereby affects the data used to update the policy. The adaptively generated data stream involves samples that are non-stationary, no longer independent from each other, and affected by previous decisions. The influence of previous decisions on the data generated introduces bias in the gradient estimate, which presents a potential source of instability for online learning not present in the iid case. In this paper, we introduce simple criteria for the adaptively generated data stream to guarantee the convergence of SGD. We show that the convergence speed of SGD with adaptive data is largely similar to the classical iid setting, as long as the mixing time of the policy-induced dynamics is factored in. Our Lyapunov-function analysis allows one to translate existing stability analysis of stochastic systems studied in operations research into convergence rates for SGD, and we demonstrate this for queueing and inventory management problems. We also showcase how our result can be applied to study the sample complexity of an actor-critic policy gradient algorithm.

algorithm, convergence, markov chain, (13 more...)

arXiv.org Artificial Intelligence

2410.01195

Country:

North America > United States (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Quantized and Asynchronous Federated Learning

Ortega, Tomas, Jafarkhani, Hamid

arXiv.org Artificial IntelligenceSep-30-2024

Recent advances in federated learning have shown that asynchronous variants can be faster and more scalable than their synchronous counterparts. However, their design does not include quantization, which is necessary in practice to deal with the communication bottleneck. To bridge this gap, we develop a novel algorithm, Quantized Asynchronous Federated Learning (QAFeL), which introduces a hidden-state quantization scheme to avoid the error propagation caused by direct quantization. QAFeL also includes a buffer to aggregate client updates, ensuring scalability and compatibility with techniques such as secure aggregation. Furthermore, we prove that QAFeL achieves an $\mathcal{O}(1/\sqrt{T})$ ergodic convergence rate for stochastic gradient descent on non-convex objectives, which is the optimal order of complexity, without requiring bounded gradients or uniform client arrivals. We also prove that the cross-term error between staleness and quantization only affects the higher-order error terms. We validate our theoretical findings on standard benchmarks.

artificial intelligence, machine learning, quantization, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TCOMM.2024.3471996

2410.00242

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Maryland (0.04)
(3 more...)

Genre: Research Report (0.83)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Preconditioning for Accelerated Gradient Descent Optimization and Regularization

Ye, Qiang

arXiv.org Machine LearningSep-30-2024

Accelerated training algorithms, such as adaptive learning rates and various normalization methods, are widely used but not fully understood. When regularization is introduced, standard optimizers like adaptive learning rates may not perform effectively. This raises the need for alternative regularization approaches and the question of how to properly combine regularization with preconditioning. In this paper, we address these challenges using the theory of preconditioning as follows: (1) We explain how preconditioning with AdaGrad, RMSProp, and Adam accelerates training; (2) We explore the interaction between regularization and preconditioning, outlining different options for selecting the variables for regularization, and in particular we discuss how to implement that for the gradient regularization; and (3) We demonstrate how normalization methods accelerate training by improving Hessian conditioning, and discuss how this perspective can lead to new preconditioning training algorithms. Our findings offer a unified mathematical framework for understanding various acceleration techniques and deriving appropriate regularization schemes.

conditioning, normalization, regularization, (15 more...)

arXiv.org Machine Learning

2410.00232

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Kentucky > Fayette County > Lexington (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)

Add feedback

Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth

Davis, Damek, Drusvyatskiy, Dmitriy, Jiang, Liwei

arXiv.org Artificial IntelligenceSep-29-2024

A prevalent belief among optimization specialists is that linear convergence of gradient descent is contingent on the function growing quadratically away from its minimizers. In this work, we argue that this belief is inaccurate. We show that gradient descent with an adaptive stepsize converges at a local (nearly) linear rate on any smooth function that merely exhibits fourth-order growth away from its minimizer. The adaptive stepsize we propose arises from an intriguing decomposition theorem: any such function admits a smooth manifold around the optimal solution -- which we call the ravine -- so that the function grows at least quadratically away from the ravine and has constant order growth along it. The ravine allows one to interlace many short gradient steps with a single long Polyak gradient step, which together ensure rapid convergence to the minimizer. We illustrate the theory and algorithm on the problems of matrix sensing and factorization and learning a single neuron in the overparameterized regime.

dist, gradient descent, inequality follow, (13 more...)

arXiv.org Artificial Intelligence

2409.19791

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Massachusetts (0.04)
(6 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Gradient-free Decoder Inversion in Latent Diffusion Models

Hong, Seongmin, Jeon, Suh Yoon, Lee, Kyeonghyun, Ryu, Ernest K., Chun, Se Young

arXiv.org Artificial IntelligenceSep-27-2024

In latent diffusion models (LDMs), denoising diffusion process efficiently takes place on latent space whose dimension is lower than that of pixel space. Decoder is typically used to transform the representation in latent space to that in pixel space. While a decoder is assumed to have an encoder as an accurate inverse, exact encoder-decoder pair rarely exists in practice even though applications often require precise inversion of decoder. Prior works for decoder inversion in LDMs employed gradient descent inspired by inversions of generative adversarial networks. However, gradient-based methods require larger GPU memory and longer computation time for larger latent space. For example, recent video LDMs can generate more than 16 frames, but GPUs with 24 GB memory can only perform gradient-based decoder inversion for 4 frames. Here, we propose an efficient gradient-free decoder inversion for LDMs, which can be applied to diverse latent models. Theoretical convergence property of our proposed inversion has been investigated not only for the forward step method, but also for the inertial Krasnoselskii-Mann (KM) iterations under mild assumption on cocoercivity that is satisfied by recent LDMs. Our proposed gradient-free method with Adam optimizer and learning rate scheduling significantly reduced computation time and memory usage over prior gradient-based methods and enabled efficient computation in applications such as noise-space watermarking while achieving comparable error levels.

decoder inversion, inversion, iteration, (15 more...)

arXiv.org Artificial Intelligence

2409.18442

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > South Korea > Seoul > Seoul (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Add feedback

Accelerated gradient descent for high frequency Model Predictive Control

Zhang, Jianghan, Jordana, Armand, Righetti, Ludovic

arXiv.org Artificial IntelligenceSep-26-2024

The recent promises of Model Predictive Control in robotics have motivated the development of tailored second-order methods to solve optimal control problems efficiently. While those methods benefit from strong convergence properties, tailored efficient implementations are challenging to derive. In this work, we study the potential effectiveness of first-order methods and show on a torque controlled manipulator that they can equal the performances of second-order methods.

artificial intelligence, machine learning, second-order method, (14 more...)

arXiv.org Artificial Intelligence

2409.18327

Country: North America > United States (0.30)

Genre: Research Report (0.65)

Industry: Energy > Oil & Gas > Upstream (0.65)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.42)

Add feedback

Stable Object Placement Under Geometric Uncertainty via Differentiable Contact Dynamics

Li, Linfeng, Yang, Gang, Shao, Lin, Hsu, David

arXiv.org Artificial IntelligenceSep-26-2024

From serving a cup of coffee to carefully rearranging delicate items, stable object placement is a crucial skill for future robots. This skill is challenging due to the required accuracy, which is difficult to achieve under geometric uncertainty. We leverage differentiable contact dynamics to develop a principled method for stable object placement under geometric uncertainty. We estimate the geometric uncertainty by minimizing the discrepancy between the force-torque sensor readings and the model predictions through gradient descent. We further keep track of a belief over multiple possible geometric parameters to mitigate the gradient-based method's sensitivity to the initialization. We verify our approach in the real world on various geometric uncertainties, including the in-hand pose uncertainty of the grasped object, the object's shape uncertainty, and the environment's shape uncertainty.

geometric parameter, geometric uncertainty, robotic & automation, (14 more...)

arXiv.org Artificial Intelligence

2409.17725

Country: Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks

Nguyen, Duong Thuy Anh, Wang, Su, Nguyen, Duong Tung, Nedich, Angelia, Poor, H. Vincent

arXiv.org Artificial IntelligenceSep-25-2024

We investigate the problem of agent-to-agent interaction in decentralized (federated) learning over time-varying directed graphs, and, in doing so, propose a consensus-based algorithm called DSGTm-TV. The proposed algorithm incorporates gradient tracking and heavy-ball momentum to distributively optimize a global objective function, while preserving local data privacy. Under DSGTm-TV, agents will update local model parameters and gradient estimates using information exchange with neighboring agents enabled through row- and column-stochastic mixing matrices, which we show guarantee both consensus and optimality. Our analysis establishes that DSGTm-TV exhibits linear convergence to the exact global optimum when exact gradient information is available, and converges in expectation to a neighborhood of the global optimum when employing stochastic gradients. Moreover, in contrast to existing methods, DSGTm-TV preserves convergence for networks with uncoordinated stepsizes and momentum parameters, for which we provide explicit bounds. These results enable agents to operate in a fully decentralized manner, independently optimizing their local hyper-parameters. We demonstrate the efficacy of our approach via comparisons with state-of-the-art baselines on real-world image classification and natural language processing tasks.

algorithm, gradient, optimization, (16 more...)

arXiv.org Artificial Intelligence

2409.17189

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Education (0.67)
Information Technology > Security & Privacy (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Differential Privacy Regularization: Protecting Training Data Through Loss Function Regularization

Aguilera-Martínez, Francisco, Berzal, Fernando

arXiv.org Artificial IntelligenceSep-25-2024

Training machine learning models based on neural networks requires large datasets, which may contain sensitive information. The models, however, should not expose private information from these datasets. Differentially private SGD [DP-SGD] requires the modification of the standard stochastic gradient descent [SGD] algorithm for training new models. In this short paper, a novel regularization strategy is proposed to achieve the same goal in a more efficient manner.

gradient, noise, privacy, (13 more...)

arXiv.org Artificial Intelligence

2409.17144

Country:

Europe > Switzerland (0.04)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Super Level Sets and Exponential Decay: A Synergistic Approach to Stable Neural Network Training

Chaudhary, Jatin, Nidhi, Dipak, Heikkonen, Jukka, Merisaari, Haari, Kanth, Rajiv

arXiv.org Artificial IntelligenceSep-25-2024

The objective of this paper is to enhance the optimization process for neural networks by developing a dynamic learning rate algorithm that effectively integrates exponential decay and advanced anti-overfitting strategies. Our primary contribution is the establishment of a theoretical framework where we demonstrate that the optimization landscape, under the influence of our algorithm, exhibits unique stability characteristics defined by Lyapunov stability principles. Specifically, we prove that the superlevel sets of the loss function, as influenced by our adaptive learning rate, are always connected, ensuring consistent training dynamics. Furthermore, we establish the "equiconnectedness" property of these superlevel sets, which maintains uniform stability across varying training conditions and epochs. This paper contributes to the theoretical understanding of dynamic learning rate mechanisms in neural networks and also pave the way for the development of more efficient and reliable neural optimization techniques. This study intends to formalize and validate the equiconnectedness of loss function as superlevel sets in the context of neural network training, opening newer avenues for future research in adaptive machine learning algorithms. We leverage previous theoretical discoveries to propose training mechanisms that can effectively handle complex and high-dimensional data landscapes, particularly in applications requiring high precision and reliability.

convergence, loss function, stability, (10 more...)

arXiv.org Artificial Intelligence

2409.16769

Country:

Europe > Finland > Southwest Finland > Turku (0.06)
Asia > Middle East > Jordan (0.05)
Europe > Finland > Northern Savo > Kuopio (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Add feedback