sparsification
Supplementary Materials A Complexity Analysis
Our proposed method significantly reduces communication overhead in federated learning. This method poses a trade-off between time and memory complexity. We also provide detailed information about the optimization hyperparameters e.g. In this section, we explore the effect of fitness sparsification i.e. selecting top-k fitness values from the To enable a fair and insightful comparison between the two population sizes, our focus was on assessing performance based on the number of members remaining post-sparsification rather than directly contrasting sparsification rates. Our results underline the crucial role that population size plays in exploring optimal solutions, overshadowing even the significance of compression rate.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
ATOMO: Communication-efficient Learning via Atomic Sparsification
Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions.
Shaving Weights with Occam's Razor: Bayesian Sparsification for Neural Networks using the Marginal Likelihood
Neural network sparsification is a promising avenue to save computational time and memory costs, especially in an age where many successful AI models are becoming too large to naively deploy on consumer hardware. While much work has focused on different weight pruning criteria, the overall sparsifiability of the network, i.e., its capacity to be pruned without quality loss, has often been overlooked. We present Sparsifiability via the Marginal likelihood (SpaM), a sparsification framework that highlights the effectiveness of using the Bayesian marginal likelihood in conjunction with sparsity-inducing priors for making neural networks more sparsifiable. Our approach implements an automatic Occam's razor that selects the most sparsifiable model that still explains the data well, both for structured and unstructured sparsification. In addition, we demonstrate that the pre-computed posterior precision from the Laplace approximation can be re-used to define a cheap pruning criterion, which outperforms many existing (more expensive) approaches. We demonstrate the effectiveness of our framework, especially at high sparsity levels, across a range of different neural network architectures and datasets.
Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques
To address the high communication costs of distributed machine learning, a large body of work has been devoted in recent years to designing various compression strategies, such as sparsification and quantization, and optimization algorithms capable of using them. Recently, Safaryan et al. (2021) pioneered a dramatically different compression design approach: they first use the local training data to form local smoothness matrices and then propose to design a compressor capable of exploiting the smoothness information contained therein. While this novel approach leads to substantial savings in communication, it is limited to sparsification as it crucially depends on the linearity of the compression operator. In this work, we generalize their smoothness-aware compression strategy to arbitrary unbiased compression operators, which also include sparsification. Specializing our results to stochastic quantization, we guarantee significant savings in communication complexity compared to standard quantization. In particular, we prove that block quantization with $n$ blocks theoretically outperforms single block quantization, leading to a reduction in communication complexity by an $\mathcal{O}(n)$ factor, where $n$ is the number of nodes in the distributed system. Finally, we provide extensive numerical evidence with convex optimization problems that our smoothness-aware quantization strategies outperform existing quantization schemes as well as the aforementioned smoothness-aware sparsification strategies with respect to three evaluation metrics: the number of iterations, the total amount of bits communicated, and wall-clock time.
Optimized Architectures for Kolmogorov-Arnold Networks
Efforts to improve Kolmogorov-Arnold networks (KANs) with architectural enhancements have been stymied by the complexity those enhancements bring, undermining the interpretability that makes KANs attractive in the first place. Here we study overprovisioned architectures combined with sparsification to learn compact, interpretable KANs without sacrificing accuracy. Crucially, we focus on differentiable sparsification, turning architecture search into an end-to-end optimization problem. Across function approximation benchmarks, dynamical systems forecasting, and real-world prediction tasks, we demonstrate competitive or superior accuracy while discovering substantially smaller models. Overprovisioning and sparsification are synergistic, with the combination outperforming either alone. The result is a principled path toward models that are both more expressive and more interpretable, addressing a key tension in scientific machine learning.
- North America > United States > Vermont > Chittenden County > Burlington (0.14)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- Asia > Japan (0.04)
Efficient Covariance Estimation for Sparsified Functional Data
Zheng, Sijie, Meng, Fandong, Zhou, Jie
Motivated by recent work involving the analysis of leveraging spatial correlations in sparsified mean estimation, we present a novel procedure for constructing covariance estimator. The proposed Random-knots (Random-knots-Spatial) and B-spline (Bspline-Spatial) estimators of the covariance function are computationally efficient. Asymptotic pointwise of the covariance are obtained for sparsified individual trajectories under some regularity conditions. Our proposed nonparametric method well perform the functional principal components analysis for the case of sparsified data, where the number of repeated measurements available per subject is small. In contrast, classical functional data analysis requires a large number of regularly spaced measurements per subject. Model selection techniques, such as the Akaike information criterion, are used to choose the model dimension corresponding to the number of eigenfunctions in the model. Theoretical results are illustrated with Monte Carlo simulation experiments. Finally, we cluster multi-domain data by replacing the covariance function with our proposed covariance estimator during PCA.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China (0.04)
Generative Myopia: Why Diffusion Models Fail at Structure
Graph Diffusion Models (GDMs) optimize for statistical likelihood, implicitly acting as \textbf{frequency filters} that favor abundant substructures over spectrally critical ones. We term this phenomenon \textbf{Generative Myopia}. In combinatorial tasks like graph sparsification, this leads to the catastrophic removal of ``rare bridges,'' edges that are structurally mandatory ($R_{\text{eff}} \approx 1$) but statistically scarce. We prove theoretically and empirically that this failure is driven by \textbf{Gradient Starvation}: the optimization landscape itself suppresses rare structural signals, rendering them unlearnable regardless of model capacity. To resolve this, we introduce \textbf{Spectrally-Weighted Diffusion}, which re-aligns the variational objective using Effective Resistance. We demonstrate that spectral priors can be amortized into the training phase with zero inference overhead. Our method eliminates myopia, matching the performance of an optimal Spectral Oracle and achieving \textbf{100\% connectivity} on adversarial benchmarks where standard diffusion fails completely (0\%).