AITopics

2507.0944

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Park, Eun-Ji, Yun, Sangwon

Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes

arXiv.org Artificial IntelligenceJul-15-2025

Recent studies have proposed interpreting the training process from an ergodic perspective. Building on this foundation, we present a unified framework for understanding and accelerating the training of deep neural networks via stochastic gradient descent (SGD). By analyzing the geometric landscape of the objective function we introduce a practical diagnostic, the running estimate of the largest Lyapunov exponent, which provably distinguishes genuine convergence toward stable minimizers from mere statistical stabilization near saddle points. We then propose a ghost category extension for standard classifiers that adds auxiliary ghost output nodes so the model gains extra descent directions that open a lateral corridor around narrow loss barriers and enable the optimizer to bypass poor basins during the early training phase. We show that this extension strictly reduces the approximation error and that after sufficient convergence the ghost dimensions collapse so that the extended model coincides with the original one and there exists a path in the enlarged parameter space along which the total loss does not increase. Taken together, these results provide a principled architecture level intervention that accelerates early stage trainability while preserving asymptotic behavior and simultaneously serves as an architecture-friendly regularizer.

artificial intelligence, convergence, machine learning, (15 more...)

2507.01003

Country: North America > United States (0.14)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Notsawo, Pascal Jr Tikeng, Dumas, Guillaume, Rabusseau, Guillaume

Grokking Beyond the Euclidean Norm of Model Parameters

arXiv.org Machine LearningJul-14-2025

Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods. In this work, we demonstrate that grokking can be induced by regularization, either explicit or implicit. More precisely, we show that when there exists a model with a property $P$ (e.g., sparse or low-rank weights) that generalizes on the problem of interest, gradient descent with a small but non-zero regularization of $P$ (e.g., $\ell_1$ or nuclear norm regularization) results in grokking. This extends previous work showing that small non-zero weight decay induces grokking. Moreover, our analysis shows that over-parameterization by adding depth makes it possible to grok or ungrok without explicitly using regularization, which is impossible in shallow cases. We further show that the $\ell_2$ norm is not a reliable proxy for generalization when the model is regularized toward a different property $P$, as the $\ell_2$ norm grows in many cases where no weight decay is used, but the model generalizes anyway. We also show that grokking can be amplified solely through data selection, with any other hyperparameter fixed.

artificial intelligence, grokking, machine learning, (19 more...)

2506.05718

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(5 more...)

Genre: Research Report > New Finding (0.92)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Mashiach, Idan, Glickman, Oren, Tirer, Tom

Catastrophic Forgetting Mitigation Through Plateau Phase Activity Profiling

arXiv.org Artificial IntelligenceJul-14-2025

Catastrophic forgetting in deep neural networks occurs when learning new tasks degrades performance on previously learned tasks due to knowledge overwriting. Among the approaches to mitigate this issue, regularization techniques aim to identify and constrain "important" parameters to preserve previous knowledge. In the highly nonconvex optimization landscape of deep learning, we propose a novel perspective: tracking parameters during the final training plateau is more effective than monitoring them throughout the entire training process. We argue that parameters that exhibit higher activity (movement and variability) during this plateau reveal directions in the loss landscape that are relatively flat, making them suitable for adaptation to new tasks while preserving knowledge from previous ones. Our comprehensive experiments demonstrate that this approach achieves superior performance in balancing catastrophic forgetting mitigation with strong performance on newly learned tasks.

artificial intelligence, deep learning, machine learning, (14 more...)

2507.08736

Country: North America (0.46)

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

arXiv.org Artificial IntelligenceJul-14-2025

An Enhanced Privacy-preserving Federated Few-shot Learning Framework for Respiratory Disease Diagnosis

Wang, Ming, Duan, Zhaoyang, Xue, Dong, Liu, Fangzhou, Zhang, Zhongheng

The labor-intensive nature of medical data annotation presents a significant challenge for respiratory disease diagnosis, resulting in a scarcity of high-quality labeled datasets in resource-constrained settings. Moreover, patient privacy concerns complicate the direct sharing of local medical data across institutions, and existing centralized data-driven approaches, which rely on amounts of available data, often compromise data privacy. This study proposes a federated few-shot learning framework with privacy-preserving mechanisms to address the issues of limited labeled data and privacy protection in diagnosing respiratory diseases. In particular, a meta-stochastic gradient descent algorithm is proposed to mitigate the overfitting problem that arises from insufficient data when employing traditional gradient descent methods for neural network training. Furthermore, to ensure data privacy against gradient leakage, differential privacy noise from a standard Gaussian distribution is integrated into the gradients during the training of private models with local data, thereby preventing the reconstruction of medical images. Given the impracticality of centralizing respiratory disease data dispersed across various medical institutions, a weighted average algorithm is employed to aggregate local diagnostic models from different clients, enhancing the adaptability of a model across diverse scenarios. Experimental results show that the proposed method yields compelling results with the implementation of differential privacy, while effectively diagnosing respiratory diseases using data from different structures, categories, and distributions.

artificial intelligence, machine learning, respiratory disease, (16 more...)

2507.0805

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)

Kumar, Navish, Möllenhoff, Thomas, Khan, Mohammad Emtiyaz, Lucchi, Aurelien

Optimization Guarantees for Square-Root Natural-Gradient Variational Inference

arXiv.org Machine LearningJul-11-2025

Variational inference with natural-gradient descent often shows fast convergence in practice, but its theoretical convergence guarantees have been challenging to establish. This is true even for the simplest cases that involve concave log-likelihoods and use a Gaussian approximation. We show that the challenge can be circumvented for such cases using a square-root parameterization for the Gaussian covariance. This approach establishes novel convergence guarantees for natural-gradient variational-Gaussian inference and its continuous-time gradient flow. Our experiments demonstrate the effectiveness of natural gradient methods and highlight their advantages over algorithms that use Euclidean or Wasserstein geometries.

artificial intelligence, machine learning research, optimization problem, (15 more...)

2507.07853

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Switzerland > Basel-City > Basel (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Gupta, Devansh, Razaviyayn, Meisam, Sharan, Vatsal

On the Inherent Privacy of Zeroth Order Projected Gradient Descent

arXiv.org Machine LearningJul-10-2025

The fine-tuning of pretrained large language models (LLMs) has demonstrated state-of-the-art performance across a range of downstream applications. However, two main challenges hinder the wide adoption of these models: the substantial memory requirements of gradient-based optimizers used for fine-tuning and the critical need to protect the privacy of domain-specific fine-tuning data. As fine-tuning LLMs grows increasingly memory-intensive, a range of strategies has emerged to address this issue. In particular, zeroth-order (ZO) optimization methods recently have gained traction due to their memory efficiency, as they do not require explicit gradient computations. Instead, the zeroth-order gradients can be computed using forward step only, significantly reducing memory use compared to gradient computation. In a pioneering approach, Malladi et al. (2023) introduced a memory-efficient technique for fine-tuning LLMs using two-point Simultaneous Perturbation Stochastic Approximation (SPSA) estimators (Spall, 1992), enabling large model fine-tuning on memory-limited devices. Since then, zeroth-order methods have gained popularity in dealing with large machine learning models due to their memory efficiency and favorable upper bounds on gap from optimality under certain conditions on the Hessian of the objective function (Zhang et al., 2024b,a; Guo et al., 2024). Another major concern in training LLMs is privacy . As large parameterized models are increasingly used in sensitive data applications, these models must protect sensitive information, especially given privacy regulations like the E.U.

large language model, machine learning, natural language, (18 more...)

2507.0561

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)

arXiv.org Machine LearningJul-10-2025

Non-Asymptotic Analysis of Online Local Private Learning with SGD

Shi, Enze, Xie, Jinhan, Jiang, Bei, Kong, Linglong, He, Xuming

Differentially Private Stochastic Gradient Descent (DP-SGD) has been widely used for solving optimization problems with privacy guarantees in machine learning and statistics. Despite this, a systematic non-asymptotic convergence analysis for DP-SGD, particularly in the context of online problems and local differential privacy (LDP) models, remains largely elusive. Existing non-asymptotic analyses have focused on non-private optimization methods, and hence are not applicable to privacy-preserving optimization problems. This work initiates the analysis to bridge this gap and opens the door to non-asymptotic convergence analysis of private optimization problems. A general framework is investigated for the online LDP model in stochastic optimization problems. We assume that sensitive information from individuals is collected sequentially and aim to estimate, in real-time, a static parameter that pertains to the population of interest. Most importantly, we conduct a comprehensive non-asymptotic convergence analysis of the proposed estimators in finite-sample situations, which gives their users practical guidelines regarding the effect of various hyperparameters, such as step size, parameter dimensions, and privacy budgets, on convergence rates. Our proposed estimators are validated in the theoretical and practical realms by rigorous mathematical derivations and carefully constructed numerical experiments.

artificial intelligence, estimator, machine learning, (18 more...)

2507.07041

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report (0.66)
Overview (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

arXiv.org Machine LearningJul-10-2025

AdaDPIGU: Differentially Private SGD with Adaptive Clipping and Importance-Based Gradient Updates for Deep Neural Networks

Zhang, Huiqi, Xie, Fang

Differential privacy has been proven effective for stochastic gradient descent; however, existing methods often suffer from performance degradation in high-dimensional settings, as the scale of injected noise increases with dimensionality. To tackle this challenge, we propose AdaDPIGU--a new differentially private SGD framework with importance-based gradient updates tailored for deep neural networks. In the pretraining stage, we apply a differentially private Gaussian mechanism to estimate the importance of each parameter while preserving privacy. During the gradient update phase, we prune low-importance coordinates and introduce a coordinate-wise adaptive clipping mechanism, enabling sparse and noise-efficient gradient updates. Theoretically, we prove that AdaDPIGU satisfies $(\varepsilon, δ)$-differential privacy and retains convergence guarantees. Extensive experiments on standard benchmarks validate the effectiveness of AdaDPIGU. All results are reported under a fixed retention ratio of 60%. On MNIST, our method achieves a test accuracy of 99.12% under a privacy budget of $ε= 8$, nearly matching the non-private model. Remarkably, on CIFAR-10, it attains 73.21% accuracy at $ε= 4$, outperforming the non-private baseline of 71.12%, demonstrating that adaptive sparsification can enhance both privacy and utility.

artificial intelligence, deep learning, machine learning, (17 more...)

2507.06525

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Zhuhai (0.04)
Asia > China > Beijing > Beijing (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-10-2025

Simple Convergence Proof of Adam From a Sign-like Descent Perspective

Peng, Hanyang, Qin, Shuang, Yu, Yue, Jiang, Fangqing, Wang, Hui, Lin, Zhouchen

Adam is widely recognized as one of the most effective optimizers for training deep neural networks (DNNs). Despite its remarkable empirical success, its theoretical convergence analysis remains unsatisfactory. Existing works predominantly interpret Adam as a preconditioned stochastic gradient descent with momentum (SGDM), formulated as $\bm{x}_{t+1} = \bm{x}_t - \frac{γ_t}{{\sqrt{\bm{v}_t}+ε}} \circ \bm{m}_t$. This perspective necessitates strong assumptions and intricate techniques, resulting in lengthy and opaque convergence proofs that are difficult to verify and extend. In contrast, we propose a novel interpretation by treating Adam as a sign-like optimizer, expressed as $\bm{x}_{t+1} = \bm{x}_t - γ_t \frac{|\bm{m}_t|}{{\sqrt{\bm{v}_t}+ε}} \circ {\rm Sign}(\bm{m}_t)$. This reformulation significantly simplifies the convergence analysis. For the first time, with some mild conditions, we prove that Adam achieves the optimal rate of ${\cal O}(\frac{1}{T^{\sfrac{1}{4}}})$ rather than the previous ${\cal O} \left(\frac{\ln T}{T^{\sfrac{1}{4}}}\right)$ under weak assumptions of the generalized $p$-affine variance and $(L_0, L_1, q)$-smoothness, without dependence on the model dimensionality or the numerical stability parameter $ε$. Additionally, our theoretical analysis provides new insights into the role of momentum as a key factor ensuring convergence and offers practical guidelines for tuning learning rates in Adam, further bridging the gap between theory and practice.

artificial intelligence, deep learning, machine learning, (15 more...)

2507.05966

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)