AITopics | theoretical analysis

MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

Neural Information Processing SystemsJun-22-2026, 04:36:40 GMT

Large Language Models (LLMs) have achieved remarkable success across many applications, with Mixture of Experts (MoE) models demonstrating great potential. Compared to traditional dense models, MoEs achieve better performance with less computation. Speculative decoding (SD) is a widely used technique to accelerate LLM inference without accuracy loss, but it has been considered efficient only for dense models. In this work, we first demonstrate that, under medium batch sizes, MoE surprisingly benefits more from SD than dense models. Furthermore, as MoE becomes sparser - the prevailing trend in MoE designs - the batch size range where SD acceleration is expected to be effective becomes broader. To quantitatively understand tradeoffs involved in SD, we develop a reliable modeling based on theoretical analyses. While current SD research primarily focuses on improving acceptance rates of algorithms, changes in workload and model architecture can still lead to degraded SD acceleration even with high acceptance rates. To address this limitation, we introduce a new metric target efficiency that characterizes these effects, thus helping researchers identify system bottlenecks and understand SD acceleration more comprehensively. For scenarios like private serving, this work unveils a new perspective to speed up MoE inference, where existing solutions struggle.

artificial intelligence, large language model, natural language, (15 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

ALearnability Analysis on Neuro-Symbolic Learning

Neural Information Processing SystemsJun-18-2026, 19:34:44 GMT

This paper presents a comprehensive theoretical analysis of the learnability of neuro-symbolic (NeSy) tasks within hybrid systems. We characterize the learnability of NeSy tasks by their derived constraint satisfaction problems (DCSPs), demonstrating that a task is learnable if and only if its corresponding DCSP admits a unique solution. Under mild assumptions, we establish the sample complexity for learnable tasks and show that, for general tasks, the asymptotic expected concept error is controlled by the degree of disagreement among DCSP solutions. Our findings unify the characterization of learnability and the phenomenon of reasoning shortcuts, providing theoretical guarantees and actionable guidance for the principled design of NeSy systems.

artificial intelligence, log scale, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia (0.93)
North America > Canada (0.28)
Europe > Austria (0.28)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.89)

Add feedback

EvoBrain: Dynamic Multi-Channel EEG Graph Modeling for Time-Evolving Brain Networks

Neural Information Processing SystemsJun-14-2026, 05:16:57 GMT

Dynamic GNNs, which integrate temporal and spatial features in Electroencephalography (EEG) data, have shown great potential in automating seizure detection. However, fully capturing the underlying dynamics necessary to represent brain states, such as seizure and non-seizure, remains a non-trivial task and presents two fundamental challenges. First, most existing dynamic GNN methods are built on temporally fixed static graphs, which fail to reflect the evolving nature of brain connectivity during seizure progression. Second, current efforts to jointly model temporal signals and graph structures and, more importantly, their interactions remain nascent, often resulting in inconsistent performance. To address these challenges, we present the first theoretical analysis of these two problems, demonstrating the effectiveness and necessity of explicit dynamic modeling and time-then-graph dynamic GNN method. Building on these insights, we propose EvoBrain, a novel seizure detection model that integrates a two-stream Mamba architecture with a GCN enhanced by Laplacian Positional Encoding, following neurological insights. Moreover, EvoBrain incorporates explicitly dynamic graph structures, allowing both nodes and edges to evolve over time. Our contributions include (a) a theoretical analysis proving the expressivity advantage of explicit dynamic modeling and time-then-graph over other approaches, (b) a novel and efficient model that significantly improves AUROC by 23\% and F1 score by 30\%, compared with the dynamic GNN baseline, and (c) broad evaluation of our method on the challenging early seizure prediction task.

artificial intelligence, name change, proceedings, (5 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.39)

Technology: Information Technology > Artificial Intelligence (0.76)

Add feedback

Spectral Learning for Infinite-Horizon Average-Reward POMDPs

Neural Information Processing SystemsJun-12-2026, 01:57:41 GMT

We address the learning problem in the context of infinite-horizon average-reward POMDPs. Traditionally, this problem has been approached using $\textit{Spectral Decomposition}$ (SD) methods applied to samples collected under non-adaptive policies, such as uniform or round-robin policies. Recently, SD techniques have been extended to accommodate a restricted class of adaptive policies such as $\textit{memoryless policies}$. However, the use of adaptive policies has introduced challenges related to data inefficiency, as SD methods typically require all samples to be drawn from a single policy. In this work, we propose $\texttt{Mixed Spectral Estimation}$, which generalizes spectral estimation techniques to support a broader class of $\textit{belief-based policies}$. We solve the open question of whether spectral methods can be applied to samples collected from multiple policies, and we provide finite-sample guarantees for our approach under standard observability and ergodicity assumptions. Building on this data-efficient estimation method, we introduce the $\texttt{Mixed Spectral UCRL}$ algorithm. Through a refined theoretical analysis, we demonstrate that it achieves a regret bound of $\widetilde{\mathcal{O}}(\sqrt{T})$ when compared to the optimal policy, without requiring full knowledge of either the transition or the observation model. Finally, we present numerical simulations that validate the theoretical analysis of both the proposed estimation procedure and the $\texttt{Mixed Spectral UCRL}$ algorithm.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ec52572b9e16b91edff5dc70e2642240-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 05:08:32 GMT

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Random Noise Defense Against Query-Based Black-Box Attacks

Neural Information Processing SystemsApr-25-2026, 14:05:44 GMT

The query-based black-box attacks have raised serious threats to machine learning models in many real applications. In this work, we study a lightweight defense method, dubbed Random Noise Defense (RND), which adds proper Gaussian noise to each query. We conduct the theoretical analysis about the effectiveness of RND against query-based black-box attacks and the corresponding adaptive attacks. Our theoretical results reveal that the defense performance of RND is determined by the magnitude ratio between the noise induced by RND and the noise added by the attackers for gradient estimation or local search. The large magnitude ratio leads to the stronger defense performance of RND, and it's also critical for mitigating adaptive attacks. Based on our analysis, we further propose to combine RND with a plausible Gaussian augmentation Fine-tuning (RND-GF). It enables RND to add larger noise to each query while maintaining the clean accuracy to obtain a better trade-off between clean accuracy and defense performance. Additionally, RND can be flexibly combined with the existing defense methods to further boost the adversarial robustness, such as adversarial training (AT). Extensive experiments on CIFAR-10 and ImageNet verify our theoretical findings and the effectiveness of RND and RND-GF.

artificial intelligence, machine learning, query, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report (0.46)

Industry: