AITopics

2504.19259

Country:

North America > United States (0.46)
Europe > Germany (0.28)
Asia (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Pasechnyuk-Vilensky, Dmitry, Doroshenko, Daniil

Feed-anywhere ANN (I) Steady Discrete $\to$ Diffusing on Graph Hidden States

arXiv.org Machine LearningJul-29-2025

We propose a novel framework for learning hidden graph structures from data using geometric analysis and nonlinear dynamics. Our approach: (1) Defines discrete Sobolev spaces on graphs for scalar/vector fields, establishing key functional properties; (2) Introduces gauge-equivalent nonlinear Schrödinger and Landau--Lifshitz dynamics with provable stable stationary solutions smoothly dependent on input data and graph weights; (3) Develops a stochastic gradient algorithm over graph moduli spaces with sparsity regularization. Theoretically, we guarantee: topological correctness (homology recovery), metric convergence (Gromov--Hausdorff), and efficient search space utilization. Our dynamics-based model achieves stronger generalization bounds than standard neural networks, with complexity dependent on the data manifold's topology.

artificial intelligence, equation, machine learning, (17 more...)

2507.20088

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Xia, Xintao, Zhang, Linjun, Cai, Zhanrui

Statistical Inference for Differentially Private Stochastic Gradient Descent

arXiv.org Machine LearningJul-29-2025

Privacy preservation in machine learning, particularly through Differentially Private Stochastic Gradient Descent (DP-SGD), is critical for sensitive data analysis. However, existing statistical inference methods for SGD predominantly focus on cyclic subsampling, while DP-SGD requires randomized subsampling. This paper first bridges this gap by establishing the asymptotic properties of SGD under the randomized rule and extending these results to DP-SGD. For the output of DP-SGD, we show that the asymptotic variance decomposes into statistical, sampling, and privacy-induced components. Two methods are proposed for constructing valid confidence intervals: the plug-in method and the random scaling method. We also perform extensive numerical analysis, which shows that the proposed confidence intervals achieve nominal coverage rates while maintaining privacy.

artificial intelligence, confidence interval, machine learning, (17 more...)

2507.2056

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

arXiv.org Machine LearningJul-29-2025

Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

Attia, Amit, Schliserman, Matan, Sherman, Uri, Koren, Tomer

We study population convergence guarantees of stochastic gradient descent (SGD) for smooth convex objectives in the interpolation regime, where the noise at optimum is zero or near zero. The behavior of the last iterate of SGD in this setting -- particularly with large (constant) stepsizes -- has received growing attention in recent years due to implications for the training of over-parameterized models, as well as to analyzing forgetting in continual learning and to understanding the convergence of the randomized Kaczmarz method for solving linear systems. We establish that after $T$ steps of SGD on $β$-smooth convex loss functions with stepsize $0 < η< 2/β$, the last iterate exhibits expected excess risk $\widetilde{O}(\frac{1}{η(2-βη) T^{1-βη/2}} + \fracη{(2-βη)^2} T^{βη/2} σ_\star^2)$, where $σ_\star^2$ denotes the variance of the stochastic gradients at the optimum. In particular, for a well-tuned stepsize we obtain a near optimal $\widetilde{O}(1/T + σ_\star/\sqrt{T})$ rate for the last iterate, extending the results of Varre et al. (2021) beyond least squares regression; and when $σ_\star=0$ we obtain a rate of $\smash{O(1/\sqrt T)}$ with $η=1/β$, improving upon the best-known $\smash{O(T^{-1/4})}$ rate recently established by Evron et al. (2025) in the special case of realizable linear regression.

artificial intelligence, convergence, machine learning, (17 more...)

2507.11274

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

arXiv.org Artificial IntelligenceJul-29-2025

Who Owns This Sample: Cross-Client Membership Inference Attack in Federated Graph Neural Networks

Li, Kunhao, Wu, Di, Bai, Jun, Xu, Jing, Yang, Lei, Zhang, Ziyi, Song, Yiliao, Yang, Wencheng, Cai, Taotao, Li, Yan

Graph-structured data is prevalent in many real-world applications, including social networks, financial systems, and molecular biology. Graph Neural Networks (GNNs) have become the de facto standard for learning from such data due to their strong representation capabilities. As GNNs are increasingly deployed in federated learning (FL) settings to preserve data locality and privacy, new privacy threats arise from the interaction between graph structures and decentralized training. In this paper, we present the first systematic study of cross-client membership inference attacks (CC-MIA) against node classification tasks of federated GNNs (FedGNNs), where a malicious client aims to infer which client owns the given data. Unlike prior centralized-focused work that focuses on whether a sample was included in training, our attack targets sample-to-client attribution, a finer-grained privacy risk unique to federated settings. We design a general attack framework that exploits FedGNNs' aggregation behaviors, gradient updates, and embedding proximity to link samples to their source clients across training rounds. We evaluate our attack across multiple graph datasets under realistic FL setups. Results show that our method achieves high performance on both membership inference and ownership identification. Our findings highlight a new privacy threat in federated graph learning-client identity leakage through structural and model-level cues, motivating the need for attribution-robust GNN design.

artificial intelligence, inference attack, machine learning, (13 more...)

2507.19964

Country: Oceania > Australia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

arXiv.org Artificial IntelligenceJul-25-2025

On the Convergence of Gradient Descent on Learning Transformers with Residual Connections

Qin, Zhen, Zhou, Jinxin, Zhu, Zhihui

Transformer models have emerged as fundamental tools across various scientific and engineering disciplines, owing to their outstanding performance in diverse applications. Despite this empirical success, the theoretical foundations of Transformers remain relatively underdeveloped, particularly in understanding their training dynamics. Existing research predominantly examines isolated components--such as self-attention mechanisms and feedforward networks--without thoroughly investigating the interdependencies between these components, especially when residual connections are present. In this paper, we aim to bridge this gap by analyzing the convergence behavior of a structurally complete yet single-layer Transformer, comprising self-attention, a feedforward network, and residual connections. We demonstrate that, under appropriate initialization, gradient descent exhibits a linear convergence rate, where the convergence speed is determined by the minimum and maximum singular values of the output matrix from the attention layer. Moreover, our analysis reveals that residual connections serve to ameliorate the ill-conditioning of this output matrix, an issue stemming from the low-rank structure imposed by the softmax operation, thereby promoting enhanced optimization stability. We also extend our theoretical findings to a multi-layer Transformer architecture, confirming the linear convergence rate of gradient descent under suitable initialization. Empirical results corroborate our theoretical insights, illustrating the beneficial role of residual connections in promoting convergence stability.

artificial intelligence, deep learning, machine learning, (13 more...)

2506.05249

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.91)

Pesme, Scott, Meanti, Giacomo, Arbel, Michael, Mairal, Julien

MAP Estimation with Denoisers: Convergence Rates and Guarantees

arXiv.org Machine LearningJul-22-2025

Denoiser models have become powerful tools for inverse problems, enabling the use of pretrained networks to approximate the score of a smoothed prior distribution. These models are often used in heuristic iterative schemes aimed at solving Maximum a Posteriori (MAP) optimisation problems, where the proximal operator of the negative log-prior plays a central role. In practice, this operator is intractable, and practitioners plug in a pretrained denoiser as a surrogate-despite the lack of general theoretical justification for this substitution. In this work, we show that a simple algorithm, closely related to several used in practice, provably converges to the proximal operator under a log-concavity assumption on the prior $p$. We show that this algorithm can be interpreted as a gradient descent on smoothed proximal objectives. Our analysis thus provides a theoretical foundation for a class of empirically successful but previously heuristic methods.

artificial intelligence, inequality, machine learning, (17 more...)

2507.15397

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
North America > United States > Indiana (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

arXiv.org Machine LearningJul-22-2025

Statistical and Algorithmic Foundations of Reinforcement Learning

Chi, Yuejie, Chen, Yuxin, Wei, Yuting

As a paradigm for sequential decision making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL in sample-starved situations, where data collection is expensive, time-consuming, or even high-stakes (e.g., in clinical trials, autonomous systems, and online advertising). How to understand and enhance the sample and computational efficacies of RL algorithms is thus of great interest. In this tutorial, we aim to introduce several important algorithmic and theoretical developments in RL, highlighting the connections between new ideas and classical topics. Employing Markov Decision Processes as the central mathematical model, we cover several distinctive RL scenarios (i.e., RL with a simulator, online RL, offline RL, robust RL, and RL with human feedback), and present several mainstream RL approaches (i.e., model-based approach, value-based approach, and policy optimization). Our discussions gravitate around the issues of sample complexity, computational efficiency, as well as algorithm-dependent and information-theoretic lower bounds from a non-asymptotic viewpoint.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

2507.14444

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.45)

Bañales, Isaías, Jaramillo, Arturo, Ricalde-Guerrero, Joshué Helí

Branching Stein Variational Gradient Descent for sampling multimodal distributions

arXiv.org Machine LearningJul-18-2025

We propose a novel particle-based variational inference method designed to work with multimodal distributions. Our approach, referred to as Branched Stein Variational Gradient Descent (BSVGD), extends the classical Stein Variational Gradient Descent (SVGD) algorithm by incorporating a random branching mechanism that encourages the exploration of the state space. In this work, a theoretical guarantee for the convergence in distribution is presented, as well as numerical experiments to validate the suitability of our algorithm. Performance comparisons between the BSVGD and the SVGD are presented using the Wasserstein distance between samples and the corresponding computational times.

algorithm, artificial intelligence, machine learning, (10 more...)

2506.13916

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)

arXiv.org Artificial IntelligenceJul-18-2025

On the Effectiveness of the z-Transform Method in Quadratic Optimization

Bach, Francis

Characterizing the convergence of real-valued or vector-v alued sequences is a key theoretical problem in data science, where the sequence index typically correspon ds to the number of iterations of an iterative algorithm (such as in optimization and signal processing) o r the number of observations (as in statistics and machine learning). This characterization can be done in mostly two ways, asymptotically or non-asymptotically. In an asymptotic analysis, an asymptotic e quivalent of the sequence is identified, which readily allows comparisons with other algorithms; however, without further analysis, the behavior at any finite time cannot be controlled. This is exactly what non-as ymptotic analysis aims to achieve, by providing bounds that are valid even for a finite index, but then only pro viding bounds that cannot always be compared. While the two approaches have their own merits, in this paper, we focus on asymptotic analysis and sequences that tend to their limit at a sub-exponential r ate that is a power of the sequence index. The main goal of this paper is to show how a classical tool from signal processing, control theory, and electrical engineering ( Oppenheim et al., 1996), the z -transform method ( Jury, 1964), can be used in this context with a striking efficiency at obtaining asymptotic eq uivalents for the class of algorithms that can be seen as iterations of potentially random linear operators i n a Hilbert space. This includes gradient descent for quadratic optimization problems as well as its accelera ted and stochastic variants ( Nesterov, 2018), 1 Landweber iterations in inverse problems ( Benning and Burger, 2018), or gossip algorithms in distributed computing ( Boyd et al., 2006).

artificial intelligence, machine learning, sequence, (20 more...)

2507.03404

Country: Europe (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)