AITopics

2607.02101

Genre: Research Report (0.82)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsJun-22-2026, 17:26:50 GMT

Discovering Important Experts for Mixture-of-Experts Models Pruning Through a Theoretical Perspective

Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models but face prohibitive memory demands due to massive parameterization. Existing pruning methods rely on heuristic metrics or impractical enumeration of expert subsets, leading to suboptimal performance or scalability. In this paper, we propose Shapley-MoE, an efficient pruning method for MoE models inspired by cooperative game theory. By quantifying each expert's contribution via Shapley value, our method identifies important experts without exhaustive combination evaluations. To overcome the NP-hard complexity of exact Shapley computation, we introduce a Monte Carlo sampling strategy for efficient approximation that reduces complexity to quadratic time. However, vanilla Monte Carlo sampling still faces issues of insufficient estimation accuracy and low sampling efficiency.

large language model, machine learning, natural language, (18 more...)

Country: Asia > China (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Energy (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
(2 more...)

Neural Information Processing SystemsJun-20-2026, 03:58:41 GMT

Restricted Spectral Gap Decomposition for Simulated Tempering Targeting Mixture Distributions

Simulated tempering is a widely used strategy for sampling from multimodal distributions. In this paper, we consider simulated tempering combined with an arbitrary local Markov chain Monte Carlo sampler and present a new decomposition theorem that provides a lower bound on the restricted spectral gap of the algorithm for sampling from mixture distributions. By working with the restricted spectral gap, the applicability of our results is extended to broader settings such as when the usual spectral gap is difficult to bound or becomes degenerate. We demonstrate the application of our theoretical results by analyzing simulated tempering combined with random walk Metropolis-Hastings for sampling from mixtures of Gaussian distributions. Our complexity bound scales polynomially with the separation between modes, logarithmically with 1/ε, where εdenotes the target accuracy in total variation distance, and exponentially with the dimension d.

artificial intelligence, machine learning, spectral gap, (18 more...)

Country:

North America > United States > Texas (0.28)
North America > United States > California (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Neural Information Processing SystemsJun-17-2026, 15:19:42 GMT

Reverse-Annealed Sequential Monte Carlo for Efficient Bayesian Optimal Experiment Design

Expected information gain (EIG) is a crucial quantity in Bayesian optimal experimental design (BOED), quantifying how useful an experiment is by the amount we expect the posterior to differ from the prior. However, evaluating the EIG can be computationally expensive since it generally requires estimating the posterior normalizing constant. In this work, we leverage two idiosyncrasies of BOED to improve efficiency of EIG estimation via sequential Monte Carlo (SMC). First, in BOED we simulate the data and thus know the true underlying parameters. Second, we ultimately care about the EIG, not the individual normalizing constants. Often we observe that the Monte Carlo variance of standard SMC estimators for the normalizing constant of a single dataset are significantly lower than the variance of the normalizing constants across datasets; the latter thus contributes the majority of the variance for EIG estimates. This suggests the potential to slightly increase variance while drastically decreasing computation time by reducing the SMC population size, which leads us to an EIG-specific SMC estimator that starts with only a single sample from the posterior and tempers backwards towards the prior. Using this single-sample estimator, which we call reverse-annealed SMC (RA-SMC), we show that it is possible to estimate EIG with orders of magnitude fewer likelihood evaluations in three models: a four-dimensional spring-mass, a six-dimensional Johnson-Cook model and a four-dimensional source-finding problem.

artificial intelligence, estimator, machine learning, (17 more...)

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.84)

arXiv.org Machine LearningJun-17-2026

Accelerated Convex Optimization via Hamiltonian Dynamics with Deterministic Integration Time

Wang, Xiuyuan, Srinivasan, Vishwak, Fu, Qiang, Mitra, Siddharth, Wilson, Ashia, Wibisono, Andre

We develop Hamiltonian dynamics-based algorithms for smooth convex optimization that achieve accelerated rates of convergence. By exploiting contraction of averaged Hamiltonian flow trajectories rather than requiring contraction at trajectory endpoints, we show that Hamiltonian dynamics-based optimization methods admit deterministic and accelerated convergence guarantees, extending prior work that is limited to quadratic objectives or holds only in expectation. We analyze an idealized continuous-time algorithm and derive practical discrete-time implementations with optimal first-order complexity, thereby establishing Hamiltonian dynamics as a useful algorithmic primitive for deterministic accelerated convex optimization.

artificial intelligence, integrator, machine learning, (17 more...)

2606.1726

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Neural Information Processing SystemsJun-16-2026, 12:40:43 GMT

Sequential Monte Carlo for Policy Optimization in Continuous POMDPs

Optimal decision-making under partial observability requires agents to balance reducing uncertainty (exploration) against pursuing immediate objectives (exploitation). In this paper, we introduce a novel policy optimization framework for continuous partially observable Markov decision processes (POMDPs) that explicitly addresses this challenge. Our method casts policy learning as probabilistic inference in a non-Markovian Feynman-Kac model that inherently captures the value of information gathering by anticipating future observations, without requiring suboptimal approximations or handcrafted heuristics. To optimize policies under this model, we develop a nested sequential Monte Carlo (SMC) algorithm that efficiently estimates a history-dependent policy gradient under samples from the optimal trajectory distribution induced by the POMDP. We demonstrate the effectiveness of our algorithm across standard continuous POMDP benchmarks, where existing methods struggle to act under uncertainty.

artificial intelligence, machine learning, pomdp, (16 more...)

Country: Europe (0.46)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Faye, Elhadji Cisse, Fall, Mame Diarra, Delchini, Sylvain, Dobigeon, Nicolas

Bridging data-driven priors via the score function for posterior sampling -- Comparative review and experimental study

arXiv.org Machine LearningJun-16-2026

This paper reviews how a diverse set of popular data-driven priors commonly used in Bayesian inverse problems can be unified through their respective score functions. By framing these priors under this common perspective, we show that they can benefit from their straightfoward and effective integration into a recently proposed sampling algorithm. The applicability of this common framework is illustrated by considering several data-driven priors, namely regularization-by-denoising, normalizing flow-based priors, score-based generative models, and convex-ridge regularizers. For these four particular priors, the performance of the method is evaluated when conducting image inpainting and single image super-resolution. These results, as well as those obtained when restoring real images acquired in a geological context, demonstrate the efficiency of the method. This unified framework proves versatile enough to handle any posterior distribution defined by a broad class of score function-based priors, beyond the specific cases considered in this paper.

artificial intelligence, machine learning, score function, (17 more...)

2606.148

Country: Europe > France (0.68)

Genre:

Overview (1.00)
Research Report > New Finding (0.64)
Research Report > Experimental Study (0.40)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

arXiv.org Machine LearningJun-16-2026

Event Generation with Parallel Langevin Sampling and Learned Stein Diagnostics

Verheyen, Rob

Efficient event generation is a major computational challenge for precision collider phenomenology, especially for high-multiplicity final states where matrix-element evaluations are expensive and rejection-sampling efficiencies are low. We study an alternative approach based on many parallel underdamped Langevin chains, retaining one terminal state from each chain to obtain unweighted events while avoiding within-chain autocorrelation. A learned Stein discrepancy is used as a convergence diagnostic, providing a data-driven estimate of the relaxation time. We apply the method to tree-level $u\bar u\to Z+n g$ event generation and find that relaxation requires only a modest number of exact-target Langevin steps, with mild growth over the multiplicities studied. Finally, we show that simple neural-network surrogate initialization can substantially reduce the required number of exact matrix-element and gradient evaluations.

artificial intelligence, evaluation, machine learning, (14 more...)

2606.14854

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

arXiv.org Machine LearningJun-16-2026

Dynestyx: A Probabilistic Programming Library for Dynamical Systems

Waxman, Daniel, Batenkov, Dmitry, Feser, John, Zane, Andy, Bingham, Eli, Marzouk, Youssef, Levine, Matthew E.

State-space models (SSMs) are the standard formalism for Bayesian treatment of dynamical systems, with natural applications in statistics, signal processing, and machine learning. Despite their importance in both theory and application, dynamical systems have proven difficult to incorporate in modern probabilistic programming languages (PPLs), making state-of-the-art methods less accessible to practitioners and introducing friction in following the "Bayesian workflow." We introduce dynestyx, a probabilistic programming library with first-class support for SSMs, including state-of-the-art methods in the estimation of both states and parameters. Through a single, unified interface, users may specify arbitrary priors for discrete-time or continuous-time dynamical systems, perform inference over mixed-effect data, and make state and parameter estimates with principled uncertainty quantification.

artificial intelligence, machine learning, programming language, (16 more...)

2606.16985

Genre: Research Report (0.90)

Industry:

Health & Medicine (0.47)
Energy (0.36)

Technology:

Information Technology > Scientific Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Software > Programming Languages (0.91)
(2 more...)

Neural Information Processing SystemsJun-15-2026, 13:37:36 GMT

Parallelizing MCMCAcross the Sequence Length

Markov chain Monte Carlo (MCMC) methods are foundational algorithms for Bayesian inference and probabilistic modeling. However, most MCMC algorithms are inherently sequential and their time complexity scales linearly with the sequence length. Previous work on adapting MCMC to modern hardware has therefore focused on running many independent chains in parallel. Here, we take an alternative approach: we propose algorithms to evaluate MCMC samplers in parallel across the chain length. To do this, we build on recent methods for parallel evaluation of nonlinear recursions that formulate the state sequence as a solution to a fixed-point problem and solve for the fixed-point using a parallel form of Newton's method. We show how this approach can be used to parallelize Gibbs, Metropolis-adjusted Langevin, and Hamiltonian Monte Carlo sampling across the sequence length. In several examples, we demonstrate the simulation of up to hundreds of thousands of MCMC samples with only tens of parallel Newton iterations. Additionally, we develop two new parallel quasi-Newton methods to evaluate nonlinear recursions with lower memory costs and reduced runtime. We find that the proposed parallel algorithms accelerate MCMC sampling across multiple examples, in some cases by more than an order of magnitude compared to sequential evaluation.

artificial intelligence, iteration, machine learning, (17 more...)

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)