AITopics

Country:

North America > United States (0.45)
Europe (0.45)
Asia (0.27)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsFeb-9-2026, 15:36:28 GMT

6cb7246003d556c4d1cbf9c17c392ee3-Supplemental-Conference.pdf

clipping, dataset, inequality follow, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Neural Information Processing SystemsFeb-9-2026, 15:36:25 GMT

6cb7246003d556c4d1cbf9c17c392ee3-Paper-Conference.pdf

artificial intelligence, clipping, machine learning, (14 more...)

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Neural Information Processing SystemsDec-24-2025, 09:59:10 GMT

Taming Fat-Tailed ("Heavier-Tailed" with Potentially Infinite Variance) Noise in Federated Learning

clipping, infinite variance, mathsf, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceOct-24-2025

On Optimal Hyperparameters for Differentially Private Deep Transfer Learning

Rehn, Aki, Zhao, Linzh, Heikkilä, Mikko A., Honkela, Antti

Differentially private (DP) transfer learning, i.e., fine-tuning a pretrained model on private data, is the current state-of-the-art approach for training large models under privacy constraints. We focus on two key hyperparameters in this setting: the clipping bound $C$ and batch size $B$. We show a clear mismatch between the current theoretical understanding of how to choose an optimal $C$ (stronger privacy requires smaller $C$) and empirical outcomes (larger $C$ performs better under strong privacy), caused by changes in the gradient distributions. Assuming a limited compute budget (fixed epochs), we demonstrate that the existing heuristics for tuning $B$ do not work, while cumulative DP noise better explains whether smaller or larger batches perform better. We also highlight how the common practice of using a single $(C,B)$ setting across tasks can lead to suboptimal performance. We find that performance drops especially when moving between loose and tight privacy and between plentiful and limited compute, which we explain by analyzing clipping as a form of gradient re-weighting and examining cumulative DP noise.

large language model, machine learning, natural language, (20 more...)

2510.20616

Country: Europe > Finland (0.14)

Genre:

Research Report (1.00)
Overview > Innovation (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

You, Haochen, Liu, Baojing

Gradient Shaping Beyond Clipping: A Functional Perspective on Update Magnitude Control

arXiv.org Artificial IntelligenceOct-3-2025

Gradient clipping is widely used to stabilize deep network training, but its formulation as a hard, fixed threshold limits flexibility and ignores gradient distribution dynamics. We propose SPAMP (Statistical Per-layer Adaptive Modulation and Projection), a unified framework that generalizes clipping into smooth, per-layer gradient shaping. SPAMP tracks local gradient statistics, dynamically estimates thresholds, and applies power-based transformations to modulate update magnitudes in a differentiable manner. This perspective recasts clipping and warmup as dual mechanisms for controlling the effective update scale $η_t \|g_t\|$, offering a principled alternative to rigid heuristics. Extensive experiments across image and language tasks demonstrate that SPAMP improves stability, convergence, and robustness over existing methods.

large language model, machine learning, natural language, (16 more...)

2510.01578

Country:

North America > United States (0.28)
Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Neural Information Processing SystemsAug-15-2025, 15:52:42 GMT

A Proofs for Fat T ailed Federated Learning

A.1 Proof of FAT-Clipping - PR For notional clarity, we have the following update: Local update: x The first inequality follows from the strongly-convex property, i.e., Assumption 4. (Bounded Stochastic Gradient V ariance) There exists a constant Assumption 5. (Bounded Gradient) There exists a constant We remark that for any stochastic estimator satisfies the above conditions, the above inequalities hold. The proof is the exactly same as that in original proof [18]. Theorem 6. Suppose f is We run a convolutional neural network (CNN) model on CIFAR-10 dataset using FedAvg. CNN architecture is shown in Table 2. To simulate data heterogeneity across clients, we manually The dataset and model are taken from [45]. This implies that the gradient noise is fat-tailed.

clipping, dataset, inequality follow, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Neural Information Processing SystemsOct-11-2024, 12:18:18 GMT

Taming Fat-Tailed ("Heavier-Tailed" with Potentially Infinite Variance) Noise in Federated Learning

In recent years, federated learning (FL) has emerged as an important distributed machine learning paradigm to collaboratively learn a global model with multiple clients, while keeping data local and private. However, a key assumption in most existing works on FL algorithms' convergence analysis is that the noise in stochastic first-order information has a finite variance. Although this assumption covers all light-tailed (i.e., sub-exponential) and some heavy-tailed noise distributions (e.g., log-normal, Weibull, and some Pareto distributions), it fails for many fat-tailed noise distributions (i.e., heavier-tailed'' with potentially infinite variance) that have been empirically observed in the FL literature. To date, it remains unclear whether one can design convergent algorithms for FL systems that experience fat-tailed noise. Specifically, for the largest \alpha \in (1,2] such that the fat-tailed noise in FL still has a bounded \alpha -moment, we show that both variants achieve \mathcal{O}((mT) {\frac{2-\alpha}{\alpha}}) and \mathcal{O}((mT) {\frac{1-\alpha}{3\alpha-2}}) convergence rates in the strongly-convex and general non-convex settings, respectively, where m and T are the numbers of clients and communication rounds.

clipping, federated learning, mathsf, (11 more...)

Genre: Play > Prospect > Container > Trap (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Zhang, Meifan, Xie, Zhanhong, Yin, Lihua

Private and Communication-Efficient Federated Learning based on Differentially Private Sketches

arXiv.org Artificial IntelligenceOct-9-2024

Federated learning (FL) faces two primary challenges: the risk of privacy leakage due to parameter sharing and communication inefficiencies. To address these challenges, we propose DPSFL, a federated learning method that utilizes differentially private sketches. DPSFL compresses the local gradients of each client using a count sketch, thereby improving communication efficiency, while adding noise to the sketches to ensure differential privacy (DP). We provide a theoretical analysis of privacy and convergence for the proposed method. Gradient clipping is essential in DP learning to limit sensitivity and constrain the addition of noise. However, clipping introduces bias into the gradients, negatively impacting FL performance. To mitigate the impact of clipping, we propose an enhanced method, DPSFL-AC, which employs an adaptive clipping strategy. Experimental comparisons with existing techniques demonstrate the superiority of our methods concerning privacy preservation, communication efficiency, and model accuracy.

gradient, noise, sketch, (17 more...)

2410.05733

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceAug-15-2024

Random Gradient Masking as a Defensive Measure to Deep Leakage in Federated Learning

Kim, Joon, Park, Sejin

Federated Learning (FL)[1][2] emerged as an artificial intelligence training method that does not require sending data from peripheral devices(clients) to a central server. Rather, each client would download the central model from the server, train it over their private data, and send the resulting gradients of the private training back to the server, all of which are aggregated by a server-side algorithm to produce the next iteration of the central model. Ideally, mutually distrusted clients never communicate their private data, and yet they produce a central model that encompasses the entire clients' data. Extensive research is being conducted on optimizing the learning efficiency of FL on various aspects such as incentive mechanisms[3], communication speed[4], non-IID training[5], and client selection[6]. However, recent research reveals that sending the gradients of private training does not ensure complete data privacy, especially in a wide cross-device environment[7]. Moreover, as a federated system, FL has to protect itself against Byzantine Failure[8], Backdoor injection[9], Model Poisoning[10], and Data Poisoning[11]).

algorithm, arxiv, federated learning, (13 more...)

2408.0843

Country:

North America > United States (0.04)
Asia > South Korea (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)