AITopics

2509.21653

Genre:

Research Report (0.63)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Neural Information Processing SystemsSep-28-2025, 22:01:28 GMT

c48fe446e651cd49fb58a6833e015103-Paper-Conference.pdf

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

Country:

North America > United States (0.46)
Europe > Poland (0.14)
Europe > Middle East > Malta (0.14)
Europe > Germany (0.14)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Energy > Oil & Gas (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Neural Information Processing SystemsSep-28-2025, 20:16:09 GMT

907a9fb75a408f6c3a2ae1bf84c39e44-Paper-Conference.pdf

artificial intelligence, experiment, machine learning, (18 more...)

Country: North America > United States > California (0.46)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Neural Information Processing SystemsSep-27-2025, 15:28:35 GMT

f242c4cba2467637256722cb679642bd-Paper-Conference.pdf

algorithm, artificial intelligence, machine learning, (16 more...)

Country: Europe (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Energy > Oil & Gas (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Neural Information Processing SystemsSep-27-2025, 04:42:30 GMT

6244b2ba957c48bc64582cf2bcec3d04-Paper.pdf

data mining, detection, machine learning, (16 more...)

Industry: Energy > Oil & Gas > Upstream (0.41)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Ayme, Alexis, Loureiro, Bruno

Breaking the curse of dimensionality for linear rules: optimal predictors over the ellipsoid

arXiv.org Machine LearningSep-26-2025

In this work, we address the following question: What minimal structural assumptions are needed to prevent the degradation of statistical learning bounds with increasing dimensionality? We investigate this question in the classical statistical setting of signal estimation from $n$ independent linear observations $Y_i = X_i^{\top}θ+ ε_i$. Our focus is on the generalization properties of a broad family of predictors that can be expressed as linear combinations of the training labels, $f(X) = \sum_{i=1}^{n} l_{i}(X) Y_i$. This class -- commonly referred to as linear prediction rules -- encompasses a wide range of popular parametric and non-parametric estimators, including ridge regression, gradient descent, and kernel methods. Our contributions are twofold. First, we derive non-asymptotic upper and lower bounds on the generalization error for this class under the assumption that the Bayes predictor $θ$ lies in an ellipsoid. Second, we establish a lower bound for the subclass of rotationally invariant linear prediction rules when the Bayes predictor is fixed. Our analysis highlights two fundamental contributions to the risk: (a) a variance-like term that captures the intrinsic dimensionality of the data; (b) the noiseless error, a term that arises specifically in the high-dimensional regime. These findings shed light on the role of structural assumptions in mitigating the curse of dimensionality.

artificial intelligence, machine learning, theorem 4, (16 more...)

arXiv.org Machine Learning

2509.21174

Country:

Europe > France (0.14)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Blagoev, Nikolay, Cox, Bart, Decouchant, Jérémie, Chen, Lydia Y.

Go With The Flow: Churn-Tolerant Decentralized Training of Large Language Models

arXiv.org Artificial IntelligenceSep-26-2025

Motivated by the emergence of large language models (LLMs) and the importance of democratizing their training, we propose GWTF, the first crash tolerant practical decentralized training framework for LLMs. Differently from existing distributed and federated training frameworks, GWTF enables the efficient collaborative training of a LLM on heterogeneous clients that volunteer their resources. In addition, GWTF addresses node churn, i.e., clients joining or leaving the system at any time, and network instabilities, i.e., network links becoming unstable or unreliable. The core of GWTF is a novel decentralized flow algorithm that finds the most effective routing that maximizes the number of microbatches trained with the lowest possible delay. We extensively evaluate GWTF on GPT-like and LLaMa-like models and compare it against the prior art. Our results indicate that GWTF reduces the training time by up to 45% in realistic and challenging scenarios that involve heterogeneous client nodes distributed over 10 different geographic locations with a high node churn rate.

large language model, machine learning, node, (19 more...)

2509.21221

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Hotegni, Sedjro Salomon, Peitz, Sebastian

SPREAD: Sampling-based Pareto front Refinement via Efficient Adaptive Diffusion

arXiv.org Artificial IntelligenceSep-26-2025

Developing efficient multi-objective optimization methods to compute the Pareto set of optimal compromises between conflicting objectives remains a key challenge, especially for large-scale and expensive problems. To bridge this gap, we introduce SPREAD, a generative framework based on Denoising Diffusion Probabilistic Models (DDPMs). SPREAD first learns a conditional diffusion process over points sampled from the decision space and then, at each reverse diffusion step, refines candidates via a sampling scheme that uses an adaptive multiple gradient descent-inspired update for fast convergence alongside a Gaussian RBF-based repulsion term for diversity. Empirical results on multi-objective optimization benchmarks, including offline and Bayesian surrogate-based settings, show that SPREAD matches or exceeds leading baselines in efficiency, scalability, and Pareto front coverage.

artificial intelligence, machine learning, optimization, (17 more...)

2509.21058

Country: Europe (0.28)

Genre:

Research Report > New Finding (0.93)
Overview (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Moturu, Abhishek, Goldenberg, Anna, Taati, Babak

LiLAW: Lightweight Learnable Adaptive Weighting to Meta-Learn Sample Difficulty and Improve Noisy Training

arXiv.org Artificial IntelligenceSep-26-2025

Training deep neural networks in the presence of noisy labels and data heterogeneity is a major challenge. We introduce Lightweight Learnable Adaptive Weighting (LiLAW), a novel method that dynamically adjusts the loss weight of each training sample based on its evolving difficulty level, categorized as easy, moderate, or hard. Using only three learnable parameters, LiLAW adaptively prioritizes informative samples throughout training by updating these weights using a single mini-batch gradient descent step on the validation set after each training mini-batch, without requiring excessive hyperparameter tuning or a clean validation set. Extensive experiments across multiple general and medical imaging datasets, noise levels and types, loss functions, and architectures with and without pretraining demonstrate that LiLAW consistently enhances performance, even in high-noise environments. It is effective without heavy reliance on data augmentation or advanced regularization, highlighting its practicality. It offers a computationally efficient solution to boost model generalization and robustness in any neural network training setup.

artificial intelligence, machine learning, validation, (18 more...)