AITopics

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Kim, Gyu Yeol, Oh, Min-hwan

Convergence of Muon with Newton-Schulz

arXiv.org Machine LearningJan-28-2026

We analyze Muon as originally proposed and used in practice -- using the momentum orthogonalization with a few Newton-Schulz steps. The prior theoretical results replace this key step in Muon with an exact SVD-based polar factor. We prove that Muon with Newton-Schulz converges to a stationary point at the same rate as the SVD-polar idealization, up to a constant factor for a given number $q$ of Newton-Schulz steps. We further analyze this constant factor and prove that it converges to 1 doubly exponentially in $q$ and improves with the degree of the polynomial used in Newton-Schulz for approximating the orthogonalization direction. We also prove that Muon removes the typical square-root-of-rank loss compared to its vector-based counterpart, SGD with momentum. Our results explain why Muon with a few low-degree Newton-Schulz steps matches exact-polar (SVD) behavior at a much faster wall-clock time and explain how much momentum matrix orthogonalization via Newton-Schulz benefits over the vector-based optimizer. Overall, our theory justifies the practical Newton-Schulz design of Muon, narrowing its practice-theory gap.

artificial intelligence, conference paper, machine learning, (18 more...)

arXiv.org Machine Learning

2601.19156

Country:

Asia > Middle East > Jordan (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceDec-2-2025

Consistency Flow Model Achieves One-step Denoising Error Correction Codes

Lei, Haoyu, Lau, Chin Wa, Zhou, Kaiwen, Guo, Nian, Farnia, Farzan

Error Correction Codes (ECC) are fundamental to reliable digital communication, yet designing neural decoders that are both accurate and computationally efficient remains challenging. Recent denoising diffusion decoders with transformer backbones achieve state-of-the-art performance, but their iterative sampling limits practicality in low-latency settings. We introduce the Error Correction Consistency Flow Model (ECCFM), an architecture-agnostic training framework for high-fidelity one-step decoding. By casting the reverse denoising process as a Probability Flow Ordinary Differential Equation (PF-ODE) and enforcing smoothness through a differential time regularization, ECCFM learns to map noisy signals along the decoding trajectory directly to the original codeword in a single inference step. Across multiple decoding benchmarks, ECCFM attains lower bit-error rates (BER) than autoregressive and diffusion-based baselines, with notable improvements on longer codes, while delivering inference speeds up from 30x to 100x faster than denoising diffusion decoders.

decoder, machine learning, natural language, (19 more...)

2512.01389

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Quality > Data Cleaning (0.92)
(2 more...)

Kumar, Punit, Imran, Asif, Kosar, Tevfik

Energy Consumption of Dataframe Libraries for End-to-End Deep Learning Pipelines:A Comparative Analysis

arXiv.org Artificial IntelligenceNov-19-2025

This paper presents a detailed comparative analysis of the performance of three major Python data manipulation libraries - Pandas, Polars, and Dask - specifically when embedded within complete deep learning (DL) training and inference pipelines. The research bridges a gap in existing literature by studying how these libraries interact with substantial GPU workloads during critical phases like data loading, preprocessing, and batch feeding. The authors measured key performance indicators including runtime, memory usage, disk usage, and energy consumption (both CPU and GPU) across various machine learning models and datasets.

artificial intelligence, machine learning, polar, (15 more...)

2511.08644

Country: North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Energy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-3-2025

PoLAR: Polar-Decomposed Low-Rank Adapter Representation

Lion, Kai, Zhang, Liang, Li, Bingcong, He, Niao

We show that low-rank adaptation of large-scale models suffers from a low stable rank that is well below the linear algebraic rank of the subspace, degrading fine-tuning performance. To mitigate the underutilization of the allocated subspace, we propose PoLAR, a parameterization inspired by the polar decomposition that factorizes the low-rank update into two direction matrices constrained to Stiefel manifolds and an unconstrained scale matrix. Our theory shows that PoLAR yields an exponentially faster convergence rate on a canonical low-rank adaptation problem. Pairing the parameterization with Riemannian optimization leads to consistent gains on three different benchmarks testing general language understanding, commonsense reasoning, and mathematical problem solving with base model sizes ranging from 350M to 27B.

large language model, machine learning, proc, (17 more...)

2506.03133

Country:

Europe > Switzerland (0.28)
North America > United States (0.27)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

arXiv.org Artificial IntelligenceOct-23-2025

POLAR: Policy-based Layerwise Reinforcement Learning Method for Stealthy Backdoor Attacks in Federated Learning

Yu, Kuai, Wu, Xiaoyu, Yan, Peishen, Yang, Qingqian, Jiang, Linshan, Wang, Hao, Hua, Yang, Song, Tao, Guan, Haibing

Federated Learning (FL) enables decentralized model training across multiple clients without exposing local data, but its distributed feature makes it vulnerable to backdoor attacks. Despite early FL backdoor attacks modifying entire models, recent studies have explored the concept of backdoor-critical (BC) layers, which poison the chosen influential layers to maintain stealthiness while achieving high effectiveness. However, existing BC layers approaches rely on rule-based selection without consideration of the interrelations between layers, making them ineffective and prone to detection by advanced defenses. In this paper, we propose POLAR (POlicy-based LAyerwise Reinforcement learning), the first pipeline to creatively adopt RL to solve the BC layer selection problem in layer-wise backdoor attack. Different from other commonly used RL paradigm, POLAR is lightweight with Bernoulli sampling. POLAR dynamically learns an attack strategy, optimizing layer selection using policy gradient updates based on backdoor success rate (BSR) improvements. To ensure stealthiness, we introduce a regularization constraint that limits the number of modified layers by penalizing large attack footprints. Extensive experiments demonstrate that POLAR outperforms the latest attack methods by up to 40% against six state-of-the-art (SOTA) defenses.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2510.19056

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Neural Information Processing SystemsSep-27-2025, 20:41:53 GMT

fcd3909db30887ce1da519c4468db668-Supplemental-Conference.pdf

artificial intelligence, data quality, machine learning, (15 more...)

Industry: Energy > Oil & Gas (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Eliya Nachmani, Lior Wolf

Hyper-Graph-Network Decoders for Block Codes

Neural Information Processing SystemsAug-19-2025, 23:02:39 GMT

Decoding algebraic block codes is an open problem and learning techniques have recently been introduced to this field.

block code, hypernetwork, iteration, (16 more...)

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Neural Information Processing SystemsAug-19-2025, 21:58:31 GMT

Error Correction Code Transformer

BCH(7,4) code with N = 6, d = 32, similarly to the results presented in Section 6.1. Interestingly, in the early stage of the decoding, ECCT seems to focus its processing of the syndrome. Once the network corrects the bit (last layers(s)), the values return to normal. Using multiple heads is clearly beneficial for the model's performance. The proposed shallow ECCTs are able to compete and even surpass the SCL for some of the codes and SNRs.

artificial intelligence, data quality, machine learning, (15 more...)

Industry: Energy > Oil & Gas (0.48)

Technology:

Information Technology > Data Science > Data Quality > Data Cleaning (0.40)
Information Technology > Artificial Intelligence > Machine Learning (0.30)

arXiv.org Machine LearningJun-26-2025

POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes

Zhang, Ruijia, Qi, Zhengling, Wu, Yue, Zhang, Xiangyu, Xu, Yanxun

Dynamic treatment regimes (DTRs) provide a principled framework for optimizing sequential decision-making in domains where decisions must adapt over time in response to individual trajectories, such as healthcare, education, and digital interventions. However, existing statistical methods often rely on strong positivity assumptions and lack robustness under partial data coverage, while offline reinforcement learning approaches typically focus on average training performance, lack statistical guarantees, and require solving complex optimization problems. To address these challenges, we propose POLAR, a novel pessimistic model-based policy learning algorithm for offline DTR optimization. POLAR estimates the transition dynamics from offline data and quantifies uncertainty for each history-action pair. A pessimistic penalty is then incorporated into the reward function to discourage actions with high uncertainty. Unlike many existing methods that focus on average training performance, POLAR directly targets the suboptimality of the final learned policy and offers theoretical guarantees, without relying on computationally intensive minimax or constrained optimization procedures. To the best of our knowledge, POLAR is the 1 first model-based DTR method to provide both statistical and computational guarantees, including finite-sample bounds on policy suboptimality. Empirical results on both synthetic data and the MIMIC-III dataset demonstrate that POLAR outperforms state-of-the-art methods and yields near-optimal, history-aware treatment strategies.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2506.20406

Country:

Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)