AITopics

2509.05865

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.87)
(2 more...)

Zhang, Minxin, Liu, Yuxuan, Schaeffer, Hayden

AdaGrad Meets Muon: Adaptive Stepsizes for Orthogonal Updates

arXiv.org Artificial IntelligenceSep-9-2025

The recently proposed Muon optimizer updates weight matrices via orthogonalized momentum and has demonstrated strong empirical success in large language model training. However, it remains unclear how to determine the learning rates for such orthogonalized updates. AdaGrad, by contrast, is a widely used adaptive method that scales stochastic gradients by accumulated past gradients. We propose a new algorithm, AdaGO, which combines a norm-based AdaGrad-type stepsize with an orthogonalized update direction, bringing together the benefits of both approaches. Unlike other adaptive variants of Muon, AdaGO preserves the orthogonality of the update direction, which can be interpreted as a spectral descent direction, while adapting the stepsizes to the optimization landscape by scaling the direction with accumulated past gradient norms. The implementation of AdaGO requires only minimal modification to Muon, with a single additional scalar variable, the accumulated squared gradient norms, to be computed, making it computationally and memory efficient. Optimal theoretical convergence rates are established for nonconvex functions in both stochastic and deterministic settings under standard smoothness and unbiased bounded-variance noise assumptions. Empirical results on CIFAR-10 classification and function regression demonstrate that AdaGO outperforms Muon and Adam.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

2509.02981

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

arXiv.org Artificial IntelligenceSep-9-2025

DCPO: Dynamic Clipping Policy Optimization

Yang, Shihui, Dou, Chengfeng, Guo, Peidong, Lu, Kai, Ju, Qiang, Deng, Fei, Xin, Rihui

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a promising framework for enhancing the reasoning capabilities of large language models. However, existing approaches such as GRPO often suffer from zero gradients. This problem arises primarily due to fixed clipping bounds for token-level probability ratios and the standardization of identical rewards, which can lead to ineffective gradient updates and underutilization of generated responses. In this work, we propose Dynamic Clipping Policy Optimization(DCPO), which introduces a dynamic clipping strategy that adaptively adjusts clipping bounds based on token-specific prior probabilities to enhance token-level exploration, and a smooth advantage standardization technique that standardizes rewards across cumulative training steps to improve the response-level effective utilization of generated responses. DCPO achieved state-of-the-art performance on four benchmarks based on four different models. In particular, DCPO achieved an Avg@1 of 46.7 under greedy decoding and an Avg@32 of 38.8 under 32 times sampling on the AIME24 benchmark, surpassing DAPO (36.7/31.6), GRPO (36.7/32.1) and GSPO (40.0/34.9) on the Qwen2.5-Math-7B model. On the AIME25 benchmark based on Qwen2.5-14B, DCPO achieves a performance of (23.3/19.0), surpassing GRPO (13.3/10.5), DAPO (20.0/15.3) and GSPO (16.7/9.9). Furthermore, DCPO achieved an average 28% improvement in the nonzero advantage over GRPO in four models, doubled the training efficiency over DAPO, and significantly reduced the token clipping ratio by an order of magnitude compared to both GRPO and DAPO, while achieving superior performance. These results highlight DCPO's effectiveness in leveraging generated data more efficiently for reinforcement learning in large language models.

large language model, machine learning, reinforcement learning, (19 more...)

2509.02333

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
(2 more...)

arXiv.org Artificial IntelligenceSep-5-2025

Integrating Intermediate Layer Optimization and Projected Gradient Descent for Solving Inverse Problems with Diffusion Models

Zheng, Yang, Li, Wen, Liu, Zhaoqiang

Inverse problems (IPs) involve reconstructing signals from noisy observations. Recently, diffusion models (DMs) have emerged as a powerful framework for solving IPs, achieving remarkable reconstruction performance. However, existing DM-based methods frequently encounter issues such as heavy computational demands and suboptimal convergence. In this work, building upon the idea of the recent work DMPlug, we propose two novel methods, DMILO and DMILO-PGD, to address these challenges. Our first method, DMILO, employs intermediate layer optimization (ILO) to alleviate the memory burden inherent in DMPlug. Additionally, by introducing sparse deviations, we expand the range of DMs, enabling the exploration of underlying signals that may lie outside the range of the diffusion model. We further propose DMILO-PGD, which integrates ILO with projected gradient descent (PGD), thereby reducing the risk of suboptimal convergence. We provide an intuitive theoretical analysis of our approaches under appropriate conditions and validate their superiority through extensive experiments on diverse image datasets, encompassing both linear and nonlinear IPs. Our results demonstrate significant performance gains over state-of-the-art methods, highlighting the effectiveness of DMILO and DMILO-PGD in addressing common challenges in DM-based IP solvers.

artificial intelligence, integrating intermediate layer optimization, machine learning, (12 more...)

2505.20789

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)

arXiv.org Machine LearningSep-3-2025

Mo' Memory, Mo' Problems: Stream-Native Machine Unlearning

Stewart, Kennon

Machine unlearning work assumes a static, i.i.d training environment that doesn't truly exist. Modern ML pipelines need to learn, unlearn, and predict continuously on production streams of data. We translate batch unlearning to the online setting using notions of regret, sample complexity, and deletion capacity. We tighten regret bounds to a logarithmic $\mathcal{O}(\ln{T})$, a first for a certified unlearning algorithm. When fitted with an online variant of L-BFGS optimization, the algorithm achieves state of the art regret with a constant memory footprint. Such changes extend the lifespan of an ML model before expensive retraining, making for a more efficient unlearning process.

artificial intelligence, deletion, machine learning, (19 more...)

2508.10193

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > New York (0.04)
North America > United States > Michigan > Wayne County > Detroit (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (0.67)
Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

arXiv.org Machine LearningSep-3-2025

Asynchronous and Stochastic Distributed Resource Allocation

Li, Qiang, Yemini, Michal, Wai, Hoi-To

This work proposes and studies the distributed resource allocation problem in asynchronous and stochastic settings. We consider a distributed system with multiple workers and a coordinating server with heterogeneous computation and communication times. We explore an approximate stochastic primal-dual approach with the aim of 1) adhering to the resource budget constraints, 2) allowing for the asynchronicity between the workers and the server, and 3) relying on the locally available stochastic gradients. We analyze our Asynchronous stochastic Primal-Dual (Asyn-PD) algorithm and prove its convergence in the second moment to the saddle point solution of the approximate problem at the rate of $O(1/t)$, where $t$ is the iteration number. Furthermore, we verify our algorithm numerically to validate the analytically derived convergence results, and demonstrate the advantages of utilizing our asynchronous algorithm rather than deploying a synchronous algorithm where the server must wait until it gets update from all workers.

algorithm, artificial intelligence, machine learning, (15 more...)

2509.01172

Country:

Asia > Middle East > Israel (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

arXiv.org Machine LearningSep-3-2025

Federated learning over physical channels: adaptive algorithms with near-optimal guarantees

Zhang, Rui, Mou, Wenlong

In federated learning, communication cost can be significantly reduced by transmitting the information over the air through physical channels. In this paper, we propose a new class of adaptive federated stochastic gradient descent (SGD) algorithms that can be implemented over physical channels, taking into account both channel noise and hardware constraints. We establish theoretical guarantees for the proposed algorithms, demonstrating convergence rates that are adaptive to the stochastic gradient noise level. We also demonstrate the practical effectiveness of our algorithms through simulation studies with deep learning models.

artificial intelligence, federated learning, machine learning, (15 more...)

2509.02538

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

arXiv.org Artificial IntelligenceSep-3-2025

Globally aware optimization with resurgence

Bu, Wei

Modern optimization faces a fundamental challenge: local gradient-based methods provide no global information about the objective function $L$ landscape, often leading to suboptimal convergence and sensitivity to initialization. We introduce a novel optimization framework that leverages resurgence theory from complex analysis to extract global structural information from divergent asymptotic series. Our key insight is that the factorially divergent perturbative expansions of parameter space partition functions encode precise information about all critical objective function value in the landscape through their Borel transform singularities. The algorithm works by computing the statistical mechanical partition function $Z(g) = \int e^{-L(θ)/g} dθ$ for small coupling $g\ll 1$, extracting its asymptotic series coefficients, and identifying Borel plane singularities that correspond one-to-one with critical objective function values. These target values provide global guidance to local optimizers, enabling principled learning rate adaptation and escape from suboptimal regions. Unlike heuristic adaptive methods, targets are theoretically grounded in the geometry of the optimization landscape.

artificial intelligence, machine learning, singularity, (17 more...)

2509.01329

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Lu, Binghang, Mou, Changhong, Lin, Guang

An Evolutionary Multi-objective Optimization for Replica-Exchange-based Physics-informed Operator Learning Network

arXiv.org Artificial IntelligenceSep-3-2025

In this paper, we propose an evolutionary Multi-objective Optimization for Replica-Exchange-based Physics-informed Operator learning Network, which is a novel operator learning network to efficiently solve parametric partial differential equations. In forward and inverse settings, this operator learning network only admits minimum requirement of noisy observational data. While physics-informed neural networks and operator learning approaches such as Deep Operator Networks and Fourier Neural Operators offer promising alternatives to traditional numerical solvers, they struggle with balancing operator and physics losses, maintaining robustness under noisy or sparse data, and providing uncertainty quantification. The proposed framework addresses these limitations by integrating: (i) evolutionary multi-objective optimization to adaptively balance operator and physics-based losses in the Pareto front; (ii) replica exchange stochastic gradient Langevin dynamics to improve global parameter-space exploration and accelerate convergence; and (iii) built-in Bayesian uncertainty quantification from stochastic sampling. The proposed operator learning method is tested numerically on several different problems including one-dimensional Burgers equation and the time-fractional mixed diffusion-wave equation. The results indicate that our framework consistently outperforms the general operator learning methods in accuracy, noise robustness, and the ability to quantify uncertainty.

artificial intelligence, evolutionary algorithm, machine learning, (14 more...)

2509.00663

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry:

Energy (1.00)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

arXiv.org Artificial IntelligenceSep-3-2025

A Fluid Antenna Enabled Physical Layer Key Generation for Next-G Wireless Networks

Guo, Jiacheng, Gao, Ning, Zuo, Yiping, Xu, Hao, Jin, Shi, Wong, Kai Kit

As a promising physical layer security technique, physical layer key generation (PLKG) enables legitimate users to obtain secret keys from wireless channel without security infrastructures. However, in harsh propagation environments, the channel characteristic becomes unsatisfactory, the key generation rate (KGR) is significantly deteriorated. In this paper, we propose a novel fluid antenna (FA) enabled PLKG system to address this challenge. Specifically, we first derive the closed-form expression of the KGR for FA array, and then jointly optimize the precoding matrix and the antenna positions via a particle swarm optimization (PSO) algorithm. Next, to further reduce the computational complexity of the optimization procedure, we develop an alternating optimization (AO) algorithm, which combines the projected gradient descent (PGD) and the PSO. Simulation results demonstrate that by exploiting the additional spatial degree of freedom (DoF), our FA enabled PLKG system is superior to the benchmarks, such as the conventional fixed-position antenna (FPA) array and the reconfigurable intelligent surface (RIS). It is worth highlighting that compared to the conventional uniform planar antenna (UPA), the FA enabled PLKG achieves a 35.42\% KGR performance improvement under PSO algorithm and a 67.73\% KGR performance improvement under AO algorithm, respectively.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

2509.00018

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)