AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

LATTEO: A Framework to Support Learning Asynchronously Tempered with Trusted Execution and Obfuscation

Kumar, Abhinav, Torres, George, Guzinski, Noah, Panwar, Gaurav, Tourani, Reza, Misra, Satyajayant, Spoczynski, Marcin, Vij, Mona, Himayat, Nageen

arXiv.org Artificial IntelligenceFeb-6-2025

The privacy vulnerabilities of the federated learning (FL) paradigm, primarily caused by gradient leakage, have prompted the development of various defensive measures. Nonetheless, these solutions have predominantly been crafted for and assessed in the context of synchronous FL systems, with minimal focus on asynchronous FL. This gap arises in part due to the unique challenges posed by the asynchronous setting, such as the lack of coordinated updates, increased variability in client participation, and the potential for more severe privacy risks. These concerns have stymied the adoption of asynchronous FL. In this work, we first demonstrate the privacy vulnerabilities of asynchronous FL through a novel data reconstruction attack that exploits gradient updates to recover sensitive client data. To address these vulnerabilities, we propose a privacy-preserving framework that combines a gradient obfuscation mechanism with Trusted Execution Environments (TEEs) for secure asynchronous FL aggregation at the network edge. To overcome the limitations of conventional enclave attestation, we introduce a novel data-centric attestation mechanism based on Multi-Authority Attribute-Based Encryption. This mechanism enables clients to implicitly verify TEE-based aggregation services, effectively handle on-demand client participation, and scale seamlessly with an increasing number of asynchronous connections. Our gradient obfuscation mechanism reduces the structural similarity index of data reconstruction by 85% and increases reconstruction error by 400%, while our framework improves attestation efficiency by lowering average latency by up to 1500% compared to RA-TLS, without additional overhead.

artificial intelligence, enclave, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.04601

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Commercial Services & Supplies > Security & Alarm Services (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Reviews: Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models

Neural Information Processing SystemsFeb-5-2025, 22:54:00 GMT

The paper proposes and theoretically analyzes a distributed SGD algorithm where the workers are pulled towards the best performing worker rather than the average worker. All three reviewers consider the theoretical contribution (analysis of convergence and cost of communication) to be interesting and rigorous. At the same time, one reviewer feels the theoretical analysis applies to a simplified case and may not shed light on the experiments that are done in more complex settings. The reviewer's were not satisfied by the rebuttal, but maintained that the paper is publishable. Overall, there is a consensus that is is a fine paper and I recommend acceptance.

deep learning model, leader stochastic gradient descent, training, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Review for NeurIPS paper: Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Neural Information Processing SystemsFeb-5-2025, 08:23:10 GMT

Weaknesses: While the "biased expectation" appears to be a powerful tool, the overall results are restricted to the gradients of the algorithm at _some_ time t in the last T iterates. While this is a common outcome of the standard analysis of SGD, it would be nice if (with some additional assumptions on f) the results could be transposed to f(x_t) or x_t within some basin of attraction. The special case of s 0 needs much more detailed treatment. While the authors point out in the supplement that \phi is continuous at s 0, much of the document switches between looking at s- 0 or s 0 without explanation. Assumption 1: I see that the authors need to contol X_t 2 in Thm 1. (Eq.

biased expectation, non-convex stochastic gradient descent, robustness analysis, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Review for NeurIPS paper: Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Neural Information Processing SystemsFeb-5-2025, 08:23:03 GMT

After significant discussions with the reviewers, the reviewers were all unanimously in appreciation of the simplicity and cleanliness of the approach presented by the paper. However the authors are strongly encouraged to improve the presentation of the paper - especially the crucial proof of Lemma 1 - multiple steps have been contracted in the presentation and clarifying them is necessary. Furthermore the case of the diminishing step-size scheme is strongly suggested to be fleshed out in theory rather than being left as straightforward extensions. Lastly, the reviewers suggested to use heavier tailed distribution like the Levy distribution to verify the theory better.

biased expectation, non-convex stochastic gradient descent, robustness analysis, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

A Novel Zero-Touch, Zero-Trust, AI/ML Enablement Framework for IoT Network Security

Shakya, Sushil, Abbas, Robert, Maric, Sasa

arXiv.org Artificial IntelligenceFeb-5-2025

The IoT facilitates a connected, intelligent, and sustainable society; therefore, it is imperative to protect the IoT ecosystem. The IoT-based 5G and 6G will leverage the use of machine learning and artificial intelligence (ML/AI) more to pave the way for autonomous and collaborative secure IoT networks. Zero-touch, zero-trust IoT security with AI and machine learning (ML) enablement frameworks offers a powerful approach to securing the expanding landscape of Internet of Things (IoT) devices. This paper presents a novel framework based on the integration of Zero Trust, Zero Touch, and AI/ML powered for the detection, mitigation, and prevention of DDoS attacks in modern IoT ecosystems. The focus will be on the new integrated framework by establishing zero trust for all IoT traffic, fixed and mobile 5G/6G IoT network traffic, and data security (quarantine-zero touch and dynamic policy enforcement). We perform a comparative analysis of five machine learning models, namely, XGBoost, Random Forest, K-Nearest Neighbors, Stochastic Gradient Descent, and Native Bayes, by comparing these models based on accuracy, precision, recall, F1-score, and ROC-AUC. Results show that the best performance in detecting and mitigating different DDoS vectors comes from the ensemble-based approaches.

artificial intelligence, detection, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.03614

Country: Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

First-ish Order Methods: Hessian-aware Scalings of Gradient Descent

Smee, Oscar, Roosta, Fred, Wright, Stephen J.

arXiv.org Artificial IntelligenceFeb-5-2025

Gradient descent is the primary workhorse for optimizing large-scale problems in machine learning. However, its performance is highly sensitive to the choice of the learning rate. A key limitation of gradient descent is its lack of natural scaling, which often necessitates expensive line searches or heuristic tuning to determine an appropriate step size. In this paper, we address this limitation by incorporating Hessian information to scale the gradient direction. By accounting for the curvature of the function along the gradient, our adaptive, Hessian-aware scaling method ensures a local unit step size guarantee, even in nonconvex settings. Near a local minimum that satisfies the second-order sufficient conditions, our approach achieves linear convergence with a unit step size. We show that our method converges globally under a significantly weaker version of the standard Lipschitz gradient smoothness assumption. Even when Hessian information is inexact, the local unit step size guarantee and global convergence properties remain valid under mild conditions. Finally, we validate our theoretical results empirically on a range of convex and nonconvex machine learning tasks, showcasing the effectiveness of the approach.

artificial intelligence, machine learning, step size, (15 more...)

arXiv.org Artificial Intelligence

2502.03701

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > Queensland > Brisbane (0.04)
(3 more...)

Genre: Research Report > New Finding (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.93)

Add feedback

Deep Linear Network Training Dynamics from Random Initialization: Data, Width, Depth, and Hyperparameter Transfer

Bordelon, Blake, Pehlevan, Cengiz

arXiv.org Machine LearningFeb-5-2025

We theoretically characterize gradient descent dynamics in deep linear networks trained at large width from random initialization and on large quantities of random data. Our theory captures the ``wider is better" effect of mean-field/maximum-update parameterized networks as well as hyperparameter transfer effects, which can be contrasted with the neural-tangent parameterization where optimal learning rates shift with model width. We provide asymptotic descriptions of both non-residual and residual neural networks, the latter of which enables an infinite depth limit when branches are scaled as $1/\sqrt{\text{depth}}$. We also compare training with one-pass stochastic gradient descent to the dynamics when training data are repeated at each iteration. Lastly, we show that this model recovers the accelerated power law training dynamics for power law structured data in the rich regime observed in recent works.

artificial intelligence, deep linear network training dynamic, machine learning, (12 more...)

arXiv.org Machine Learning

2502.02531

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Bayesian Parameter Shift Rule in Variational Quantum Eigensolvers

Pedrielli, Samuele, Anders, Christopher J., Funcke, Lena, Jansen, Karl, Nicoli, Kim A., Nakajima, Shinichi

arXiv.org Artificial IntelligenceFeb-4-2025

Parameter shift rules (PSRs) are key techniques for efficient gradient estimation in variational quantum eigensolvers (VQEs). In this paper, we propose its Bayesian variant, where Gaussian processes with appropriate kernels are used to estimate the gradient of the VQE objective. Our Bayesian PSR offers flexible gradient estimation from observations at arbitrary locations with uncertainty information and reduces to the generalized PSR in special cases. In stochastic gradient descent (SGD), the flexibility of Bayesian PSR allows the reuse of observations in previous steps, which accelerates the optimization process. Furthermore, the accessibility to the posterior uncertainty, along with our proposed notion of gradient confident region (GradCoRe), enables us to minimize the observation costs in each SGD step. Our numerical experiments show that the VQE optimization with Bayesian PSR and GradCoRe significantly accelerates SGD and outperforms the state-of-the-art methods, including sequential minimal optimization.

artificial intelligence, bayesian parameter shift rule, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2502.02625

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Europe > Germany > Berlin (0.04)
Asia > Japan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants

Morwani, Depen, Vyas, Nikhil, Zhang, Hanlin, Kakade, Sham

arXiv.org Artificial IntelligenceFeb-4-2025

Recent advancements in deep learning optimization have introduced new algorithms, such as Schedule-Free optimizers, AdEMAMix, MARS and Lion which modify traditional momentum mechanisms. In a separate line of work, theoretical acceleration of stochastic gradient descent (SGD) in noise-dominated regime has been achieved by decoupling the momentum coefficient from the current gradient's weight. In this paper, we establish explicit connections between these two lines of work. We substantiate our theoretical findings with preliminary experiments on a 150m language modeling task. We find that AdEMAMix, which most closely resembles accelerated versions of stochastic gradient descent, exhibits superior performance. Building on these insights, we introduce a modification to AdEMAMix, termed Simplified-AdEMAMix, which maintains the same performance as AdEMAMix across both large and small batch-size settings while eliminating the need for two different momentum terms. The code for Simplified-AdEMAMix is available on the repository: https://github.com/DepenM/Simplified-AdEMAMix/.

ademamix, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.02431

Country: North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)

Add feedback

Privacy Amplification by Structured Subsampling for Deep Differentially Private Time Series Forecasting

Schuchardt, Jan, Dalirrooyfard, Mina, Guzelkabaagac, Jed, Schneider, Anderson, Nevmyvaka, Yuriy, Günnemann, Stephan

arXiv.org Machine LearningFeb-4-2025

Many forms of sensitive data, such as web traffic, mobility data, or hospital occupancy, are inherently sequential. The standard method for training machine learning models while ensuring privacy for units of sensitive information, such as individual hospital visits, is differentially private stochastic gradient descent (DP-SGD). However, we observe in this work that the formal guarantees of DP-SGD are incompatible with timeseries-specific tasks like forecasting, since they rely on the privacy amplification attained by training on small, unstructured batches sampled from an unstructured dataset. In contrast, batches for forecasting are generated by (1) sampling sequentially structured time series from a dataset, (2) sampling contiguous subsequences from these series, and (3) partitioning them into context and ground-truth forecast windows. We theoretically analyze the privacy amplification attained by this structured subsampling to enable the training of forecasting models with sound and tight event- and user-level privacy guarantees. Towards more private models, we additionally prove how data augmentation amplifies privacy in self-supervised training of sequence models. Our empirical evaluation demonstrates that amplification by structured subsampling enables the training of forecasting models with strong formal privacy guarantees.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2502.0241

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > California (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback