AITopics | sparsity level

Collaborating Authors

sparsity level

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Lua-LLM: Learning Unstructured-Sparsity Allocation for Large Language Models

Neural Information Processing SystemsJun-22-2026, 00:45:08 GMT

Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their extensive parameter scales pose significant challenges for practical deployment. Unstructured pruning has emerged as an effective model compression strategy with minimal performance loss, which introduces fine-grained sparsity for weight parameters. While existing methods employ a layer-wise pruning strategy to avoid the complexity of global pruning for billion-scale LLMs, they require appropriate sparsity allocation for the layer-wise pruning objectives and often lead to suboptimal solutions for the overall model. In this paper, we propose Lua-LLM (Learning unstructured-sparsity allocation in LLMs), a learning-based global pruning framework that explores the optimal unstructured sparsity allocation. Unlike existing pruning methods, which primarily focus on allocating per-layer sparsity, Lua-LLM achieves flexible allocation for both layer-wise and intra-layer sparsity.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe > Austria > Vienna (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis

Neural Information Processing SystemsJun-17-2026, 03:32:55 GMT

Sparse neural networks promise efficiency, yet training them effectively remains a fundamental challenge. Despite advances in pruning methods that create sparse architectures, understanding why some sparse structures are better trainable than others with the same level of sparsity remains poorly understood. Aiming to develop a systematic approach to this fundamental problem, we propose a novel theoretical framework based on the theory of graph limits, particularly graphons, that characterizes sparse neural networks in the infinite-width regime. Our key insight is that connectivity patterns of sparse neural networks induced by pruning methods converge to specific graphons as networks' width tends to infinity, which encodes implicit structural biases of different pruning methods. We postulate the Graphon Limit Hypothesis and provide empirical evidence to support it. Leveraging this graphon representation, we derive a Graphon Neural Tangent Kernel (Graphon NTK) to study the training dynamics of sparse networks in the infinite width limit. Graphon NTK provides a general framework for the theoretical analysis of sparse networks. We empirically show that the spectral analysis of Graphon NTK correlates with observed training dynamics of sparse networks, explaining the varying convergence behaviours of different pruning methods. Our framework provides theoretical insights into the impact of connectivity patterns on the trainability of various sparse network architectures.

artificial intelligence, machine learning, neural network, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected

Neural Information Processing SystemsJun-16-2026, 22:01:01 GMT

This study aims to enlarge our current knowledge on the application of braininspired network science principles for training artificial neural networks (ANNs) with sparse connectivity.

machine learning, natural language, sparsity, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected

Neural Information Processing SystemsJun-11-2026, 23:31:37 GMT

This study aims to enlarge our current knowledge on the application of brain-inspired network science principles for training artificial neural networks (ANNs) with sparse connectivity.

artificial intelligence, machine learning, proceedings, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.80)

Add feedback

Feature Starvation as Geometric Instability in Sparse Autoencoders

Chaudhry, Faris, Yano, Keisuke, Monod, Anthea

arXiv.org Machine LearningMay-8-2026

Sparse autoencoders (SAEs) are used to disentangle the dense, polysemantic internal representations of large language models (LLMs) into interpretable, monosemantic concepts. However, standard $\ell_1$-regularized SAEs suffer from feature starvation (dead neurons) and shrinkage bias, often requiring computationally expensive heuristic resampling and nondifferentiable hard-masking methods to bypass these challenges. We argue that feature starvation is not merely an empirical artifact of poor data diversity, but a fundamental optimization-geometric pathology of overcomplete dictionaries: the $\ell_1$-induced sparse coding map is unstable and fundamentally misaligned with shallow, amortized encoders. To address this structural instability, we introduce adaptive elastic net SAEs (AEN-SAEs), a fully differentiable architecture grounded in classical sparse regression. AEN-SAEs combine an $\ell_2$ structural term that enforces strong convexity and Lipschitz stability with adaptive $\ell_1$ reweighting that eliminates shrinkage bias and suppresses spurious features, thereby jointly controlling the curvature and interaction structure of the induced polyhedral geometry. Theoretically, we show that AEN-SAEs yield a Lipschitz-continuous sparse coding map and recover the global feature support under mild assumptions. Empirically, across synthetic settings and LLMs (Pythia 70M, Llama 3.1 8B), AEN-SAEs mitigate feature starvation without auxiliary heuristics while maintaining competitive reconstruction abilities.

feature starvation, large language model, machine learning, (18 more...)

arXiv.org Machine Learning

2605.05341

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Normalization Layers Are All That Sharpness-Aware Minimization Needs

Neural Information Processing SystemsApr-29-2026, 23:47:51 GMT

Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.

machine learning, natural language, sam-on, (18 more...)

Neural Information Processing Systems

Country:

Europe > Germany (0.28)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

BayesTune: Bayesian Sparse Deep Model Fine-tuning

Neural Information Processing SystemsApr-29-2026, 19:48:41 GMT

Deep learning practice is increasingly driven by powerful foundation models (FM), pre-trained at scale and then fine-tuned for specific tasks of interest. A key property of this workflow is the efficacy of performing sparse or parameter-efficient finetuning, meaning that by updating only a tiny fraction of the whole FM parameters on a downstream task can lead to surprisingly good performance, often even superior to a full model update. However, it is not clear what is the optimal and principled way to select which parameters to update. Although a growing number of sparse fine-tuning ideas have been proposed, they are mostly not satisfactory, relying on hand-crafted heuristics or heavy approximation. In this paper we propose a novel Bayesian sparse fine-tuning algorithm: we place a (sparse) Laplace prior for each parameter of the FM, with the mean equal to the initial value and the scale parameter having a hyper-prior that encourages small scale.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Workflow (0.48)
Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Neighbor Embedding for High-Dimensional Sparse Poisson Data

Mudrik, Noga, Charles, Adam S.

arXiv.org Machine LearningApr-21-2026

Across many scientific fields, measurements often represent the number of times an event occurs. For example, a document can be represented by word occurrence counts, neural activity by spike counts per time window, or online communication by daily email counts. These measurements yield high-dimensional count data that often approximate a Poisson distribution, frequently with low rates that produce substantial sparsity and complicate downstream analysis. A useful approach is to embed the data into a low-dimensional space that preserves meaningful structure, commonly termed dimensionality reduction. Yet existing dimensionality reduction methods, including both linear (e.g., PCA) and nonlinear approaches (e.g., t-SNE), often assume continuous Euclidean geometry, thereby misaligning with the discrete, sparse nature of low-rate count data. Here, we propose p-SNE (Poisson Stochastic Neighbor Embedding), a nonlinear neighbor embedding method designed around the Poisson structure of count data, using KL divergence between Poisson distributions to measure pairwise dissimilarity and Hellinger distance to optimize the embedding. We test p-SNE on synthetic Poisson data and demonstrate its ability to recover meaningful structure in real-world count datasets, including weekday patterns in email communication, research area clusters in OpenReview papers, and temporal drift and stimulus gradients in neural spike recordings.

artificial intelligence, embedding, machine learning, (15 more...)

arXiv.org Machine Learning

2604.16932

Country: North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Multi-Step Stochastic ADMM in High Dimensions: Applications to Sparse Optimization and Matrix Decomposition

Hanie Sedghi, Anima Anandkumar, Edmond Jonckheere

Neural Information Processing SystemsFeb-18-2026, 21:24:45 GMT

In this paper, we consider a multi-step version of the stochastic ADMM method with efficient guarantees for high-dimensional problems. We first analyze the simple setting, where the optimization problem consists of a loss function and asingleregularizer(e.g.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)

Add feedback

Navigating Extremes: Dynamic Sparsity in Large Output Spaces

Neural Information Processing SystemsFeb-18-2026, 07:00:54 GMT

In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a more memory efficient training process, as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this in practice. Because sparse matrix multiplication is much less efficient than dense matrix multiplication on GPUs, most implementations simulate sparsity by masking weights.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: