AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Distributed Optimization for Overparameterized Problems: Achieving Optimal Dimension Independent Communication Complexity

Neural Information Processing SystemsApr-25-2026, 04:39:12 GMT

Decentralized optimization are playing an important role in applications such as training large machine learning models, among others. Despite its superior practical performance, there has been some lack of fundamental understanding about its theoretical properties. In this work, we address the following open research question: To train an overparameterized model over a set of distributed nodes, what is the minimum communication overhead (in terms of the bits got exchanged) that the system needs to sustain, while still achieving (near) zero training loss? We show that for a class of overparameterized models where the number of parameters D is much larger than the total data samples N, the best possible communication complexity is (N), which is independent of the problem dimension D. Further, for a few specific overparameterized models (i.e., the linear regression, and certain multi-layer neural network with one wide layer), we develop a set of algorithms which uses certain linear compression followed by adaptive quantization, and show that they achieve dimension independent, near-optimal communication complexity. To our knowledge, this is the first time that dimension independent communication complexity has been shown for distributed optimization.

artificial intelligence, machine learning, optimization problem, (12 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

FLSL: Feature-level Self-supervised Learning

Neural Information Processing SystemsApr-25-2026, 04:38:57 GMT

Current self-supervised learning (SSL) methods (e.g., SimCLR, DINO, VICReg, MOCOv3) target primarily on representations at instance level and do not generalize well to dense prediction tasks, such as object detection and segmentation. Towards aligning SSL with dense predictions, this paper demonstrates for the first time the underlying mean-shift clustering process of Vision Transformers (ViT), which aligns well with natural image semantics (e.g., a world of objects and stuffs). By employing transformer for joint embedding and clustering, we propose a bi-level feature clustering SSL method, coined Feature-Level Self-supervised Learning (FLSL). We present the formal definition of the FLSL problem and construct the objectives from the mean-shift and k-means perspectives. We show that FLSL promotes remarkable semantic cluster representations and learns an encoding scheme amenable to intra-view and inter-view feature clustering. Experiments show that FLSL yields significant improvements in dense prediction tasks, achieving 44.9 (+2.8)% AP and 46.5% AP in object detection, as well as 40.8 (+2.3)%

artificial intelligence, machine learning, representation, (15 more...)

Neural Information Processing Systems

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

One-Line-of-Code Data Mollification Improves Optimization of Likelihood-based Generative Models

Neural Information Processing SystemsApr-25-2026, 04:37:58 GMT

Generative Models (GMs) have attracted considerable attention due to their tremendous success in various domains, such as computer vision where they are capable to generate impressive realistic-looking images. Likelihood-based GMs are attractive due to the possibility to generate new data by a single model evaluation. However, they typically achieve lower sample quality compared to state-of-the-art score-based Diffusion Models (DMs). This paper provides a significant step in the direction of addressing this limitation. The idea is to borrow one of the strengths of score-based DMs, which is the ability to perform accurate density estimation in low-density regions and to address manifold overfitting by means of data mollification. We propose a view of data mollification within likelihood-based GMs as a continuation method, whereby the optimization objective smoothly transitions from simple-to-optimize to the original target. Crucially, data mollification can be implemented by adding one line of code in the optimization loop, and we demonstrate that this provides a boost in generation quality of likelihood-based GMs, without computational overheads. We report results on real-world image data sets and UCI benchmarks with popular likelihood-based GMs, including variants of variational autoencoders and normalizing flows, showing large improvements in FID score and density estimation.

artificial intelligence, international conference, machine learning, (11 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

26d4b4313a7e5828856bc0791fca39a2-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 04:37:50 GMT

artificial intelligence, machine learning, probability, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.69)

Genre: Research Report > Experimental Study (0.46)

Industry: Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.49)

Add feedback

Support vector machines and linear regression coincide with very high-dimensional features

Neural Information Processing SystemsApr-25-2026, 04:37:45 GMT

The support vector machine (SVM) and minimum Euclidean norm least squares regression are two fundamentally different approaches to fitting linear models, but they have recently been connected in models for very high-dimensional data through a phenomenon of support vector proliferation, where every training example used to fit an SVM becomes a support vector. In this paper, we explore the generality of this phenomenon and make the following contributions. First, we prove a super-linear lower bound on the dimension (in terms of sample size) required for support vector proliferation in independent feature models, matching the upper bounds from previous works. We further identify a sharp phase transition in Gaussian feature models, bound the width of this transition, and give experimental support for its universality. Finally, we hypothesize that this phase transition occurs only in much higher-dimensional settings in the ℓ1 variant of the SVM, and we present a new geometric characterization of the problem that may elucidate this phenomenon for the general ℓp case.

artificial intelligence, machine learning, transition, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

285b06e0dd856f20591b0a5beb954151-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 04:37:35 GMT

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

ACommunication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

Neural Information Processing SystemsApr-25-2026, 04:20:59 GMT

In this paper, we study a large-scale multi-agent minimax optimization problem, which models many interesting applications in statistical learning and game theory, including Generative Adversarial Networks (GANs). The overall objective is a sum of agents' private local objective functions. We focus on the federated setting, where agents can perform local computation and communicate with a central server. Most existing federated minimax algorithms either require communication per iteration or lack performance guarantees with the exception of Local Stochastic Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent algorithm which guarantees convergence under a diminishing stepsize. By analyzing Local SGDA under the ideal condition of no gradient noise, we show that generally it cannot guarantee exact convergence with constant stepsizes and thus suffers from slow rates of convergence. To tackle this issue, we propose FedGDA-GT, an improved Federated (Fed) Gradient Descent Ascent (GDA) method based on Gradient Tracking (GT).

artificial intelligence, machine learning, neural information processing system, (12 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)

Add feedback

Supplementary Material Dynamic Results a)b)c)d)e)f)g)

Neural Information Processing SystemsApr-25-2026, 04:19:26 GMT

The different cases represent various material property configurations. In Figure 8 we show the temporal evolution of different scenarios (a) to (d) for the initially straight bending rod, and (e) to (f) for the elastic helix. The default parameters of the initially straight bending rod are 0 =0, N = 30, ` =4 .0 In (b), we modify N 2{ 10,20,40,60}. The default parameters of the elastic helix are HR =0 .5 m (helix radius), HH =0 .5 m (helix height), HW =2 .5 (winding number), T =1 .0

artificial intelligence, machine learning, temporal evolution, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.31)

Add feedback

Efficient Sampling on Riemannian Manifolds via Langevin MCMC

Neural Information Processing SystemsApr-25-2026, 04:02:33 GMT

We study the task of efficiently sampling from a Gibbs distribution dπ = e hdvolg over a Riemannian manifold M via (geometric) Langevin MCMC; this algorithm involves computing exponential maps in random Gaussian directions and is efficiently implementable in practice. The key to our analysis of Langevin MCMC is a bound on the discretization error of the geometric Euler-Murayama scheme, assuming his Lipschitz and M has bounded sectional curvature. Our error bound matches the error of Euclidean Euler-Murayama in terms of its stepsize dependence. Combined with a contraction guarantee for the geometric Langevin Diffusion under Kendall-Cranston coupling, we prove that the Langevin MCMC iterates lie within ε-Wasserstein distance of π after O(ε 2)steps, which matches the iteration complexity for Euclidean Langevin MCMC. Our results apply in general settings where hcan be nonconvex and M can have negative Ricci curvature. Under additional assumptions that the Riemannian curvature tensor has bounded derivatives, and that π satisfies a CD(,) condition, we analyze the stochastic gradient version of Langevin MCMC, and bound its iteration complexity by O(ε 2)as well.

artificial intelligence, machine learning, theorem 1, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Filters

Collaborating Authors

Statistical Learning

Distributed Optimization for Overparameterized Problems: Achieving Optimal Dimension Independent Communication Complexity

15212bd2265c4a3ab0dbc1b1982c1b69-Supplemental-Conference.pdf

FLSL: Feature-level Self-supervised Learning

One-Line-of-Code Data Mollification Improves Optimization of Likelihood-based Generative Models

26d4b4313a7e5828856bc0791fca39a2-Supplemental.pdf

Support vector machines and linear regression coincide with very high-dimensional features

285b06e0dd856f20591b0a5beb954151-Paper-Conference.pdf

ACommunication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

Supplementary Material Dynamic Results a)b)c)d)e)f)g)

Efficient Sampling on Riemannian Manifolds via Langevin MCMC