AITopics

Generative artificial intelligence (AI) has made unprecedented advances in vision language models over the past two years. During the generative process, new samples (images) are generated from an unknown high-dimensional distribution. Markov Chain Monte Carlo (MCMC) methods are particularly effective in drawing samples from such complex, high-dimensional distributions. This makes MCMC methods an integral component for models like EBMs, ensuring accurate sample generation. Gradient-based optimization is at the core of modern generative models. The update step during the optimization forms a Markov chain where the new update depends only on the current state. This allows exploration of the parameter space in a memoryless manner, thus combining the benefits of gradient-based optimization and MCMC sampling. MCMC methods have shown an equally important role in physically based rendering where complex light paths are otherwise quite challenging to sample from simple importance sampling techniques. A lot of research is dedicated towards bringing physical realism to samples (images) generated from diffusion-based generative models in a data-driven manner, however, a unified framework connecting these techniques is still missing. In this course, we take the first steps toward understanding each of these components and exploring how MCMC could potentially serve as a bridge, linking these closely related areas of research. Our course aims to provide necessary theoretical and practical tools to guide students, researchers and practitioners towards the common goal of generative physically based rendering. All Jupyter notebooks with demonstrations associated to this tutorial can be found on the project webpage: https://sinbag.github.io/mcmc/

artificial intelligence, bayesian inference, machine learning, (17 more...)

doi: 10.1145/3680532.368959

2510.09078

Country:

Europe (1.00)
North America > United States (0.28)

Genre:

Instructional Material (0.66)
Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

$\mathsf{P} \neq \mathsf{NP}$: A Non-Relativizing Proof via Quantale Weakness and Geometric Complexity

Goertzel, Ben

We give a compositional, information-theoretic framework that turns short programs into locality on many independent blocks, and combine it with symmetry and sparsity of masked random Unique-SAT to obtain distributional lower bounds that contradict the self-reduction upper bound under $\mathsf{P}=\mathsf{NP}$. We work in the weakness quantale $w_Q=K_{\mathrm{poly}}(\cdot\mid\cdot)$. For an efficiently samplable ensemble $D_m$ made by masking random $3$-CNFs with fresh $S_m\ltimes(\mathbb{Z}_2)^m$ symmetries and a small-seed Valiant--Vazirani isolation layer, we prove a Switching-by-Weakness normal form: for any polytime decoder $P$ of description length $\le δt$ (with $t=Θ(m)$ blocks), a short wrapper $W$ makes $(P\circ W)$ per-bit local on a $γ$-fraction of blocks. Two ingredients then force near-randomness on $Ω(t)$ blocks for every short decoder: (a) a sign-invariant neutrality lemma giving $\Pr[X_i=1\mid \mathcal{I}]=1/2$ for any sign-invariant view $\mathcal{I}$; and (b) a template sparsification theorem at logarithmic radius showing that any fixed local rule appears with probability $m^{-Ω(1)}$. Combined with single-block bounds for tiny $\mathrm{ACC}^0$/streaming decoders, this yields a success bound $2^{-Ω(t)}$ and, by Compression-from-Success, $K_{\mathrm{poly}}\big((X_1,\ldots,X_t)\mid(Φ_1,\ldots,Φ_t)\big)\ge ηt$. If $\mathsf{P}=\mathsf{NP}$, a uniform constant-length program maps any on-promise instance to its unique witness in polytime (bit fixing via a $\mathrm{USAT}$ decider), so $K_{\mathrm{poly}}(X\midΦ)\le O(1)$ and the tuple complexity is $O(1)$, contradicting the linear bound. The proof is non-relativizing and non-natural; symmetry, sparsification, and switching yield a quantale upper-lower clash, hence $\mathsf{P}\ne\mathsf{NP}$.

artificial intelligence, logm, machine learning, (18 more...)

2510.08814

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Weights initialization of neural networks for function approximation

Hu, Xinwen, Huang, Yunqing, Yi, Nianyu, Yin, Peimeng

Neural network-based function approximation plays a pivotal role in the advancement of scientific computing and machine learning. Yet, training such models faces several challenges: (i) each target function often requires training a new model from scratch; (ii) performance is highly sensitive to architectural and hyperparameter choices; and (iii) models frequently generalize poorly beyond the training domain. To overcome these challenges, we propose a reusable initialization framework based on basis function pretraining. In this approach, basis neural networks are first trained to approximate families of polynomials on a reference domain. Their learned parameters are then used to initialize networks for more complex target functions. To enhance adaptability across arbitrary domains, we further introduce a domain mapping mechanism that transforms inputs into the reference domain, thereby preserving structural correspondence with the pretrained models. Extensive numerical experiments in one- and two-dimensional settings demonstrate substantial improvements in training efficiency, generalization, and model transferability, highlighting the promise of initialization-based strategies for scalable and modular neural function approximation. The full code is made publicly available on Gitee.

artificial intelligence, fuzzy logic, machine learning, (18 more...)

2510.0878

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.84)

Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting

Feng, Yunzhen, Jain, Parag, Hartshorn, Anthony, Duan, Yaqi, Kempe, Julia

Reinforcement learning with verifiable rewards (RLVR) has become a standard recipe for improving large language models (LLMs) on reasoning tasks, with Group Relative Policy Optimization (GRPO) widely used in practice. Yet GRPO wastes substantial compute on negative groups: groups in which no sampled response is correct yield zero advantage and thus no gradient. We ask whether negative groups can be leveraged without extra supervision. Starting from a maximum-likelihood (MLE) objective in reward modeling, we show that the MLE gradient is equivalent to a policy gradient for a modified value function. This value function adds a confidence-weighted penalty on incorrect responses, imposing larger penalties on more confident mistakes. We refer to this as \textbf{L}ikelihood \textbf{E}stimation with \textbf{N}egative \textbf{S}amples (\textbf{LENS}). LENS modifies GRPO to assign non-zero, confidence-dependent rewards to incorrect generations, making negative groups informative and converting previously wasted samples into useful gradient updates. On the MATH benchmark with Llama-3.1-8B and Qwen-2.5-3B, the proposed variant consistently outperforms GRPO baseline, with significant gains on harder items. These results demonstrate a principled and practical way to "rescue" negative groups, improving efficiency and performance in RLVR.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

2510.08696

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Miandashti, Hanieh Shojaei, Brenner, Claus

Out-of-Distribution Detection in LiDAR Semantic Segmentation Using Epistemic Uncertainty from Hierarchical GMMs

In addition to accurate scene understanding through precise semantic segmentation of LiDAR point clouds, detecting out-of-distribution (OOD) objects, instances not encountered during training, is essential to prevent the incorrect assignment of unknown objects to known classes. While supervised OOD detection methods depend on auxiliary OOD datasets, unsupervised methods avoid this requirement but typically rely on predictive entropy, the entropy of the predictive distribution obtained by averaging over an ensemble or multiple posterior weight samples. However, these methods often conflate epistemic (model) and aleatoric (data) uncertainties, misclassifying ambiguous in distribution regions as OOD. To address this issue, we present an unsupervised OOD detection approach that employs epistemic uncertainty derived from hierarchical Bayesian modeling of Gaussian Mixture Model (GMM) parameters in the feature space of a deep neural network. Without requiring auxiliary data or additional training stages, our approach outperforms existing uncertainty-based methods on the SemanticKITTI dataset, achieving an 18\% improvement in AUROC, 22\% increase in AUPRC, and 36\% reduction in FPR95 (from 76\% to 40\%), compared to the predictive entropy approach used in prior works.

artificial intelligence, epistemic uncertainty, machine learning, (15 more...)

2510.08631

Country: Europe > Germany (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(3 more...)

Neural Information Processing SystemsOct-11-2025, 00:48:13 GMT

Enhancing Diversity in Bayesian Deep Learning via Hyperspherical Energy Minimization of CKA

Bayesian deep learning has always garnered substantial interest in the machine learning community.

ensemble, he-cka, kernel, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Oregon > Benton County > Corvallis (0.04)
North America > United States > California > Santa Clara County > Sunnyvale (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)

Neural Information Processing SystemsOct-11-2025, 00:46:59 GMT

PhyloGen: Language Model-Enhanced Phylogenetic Inference via Graph Structure Generation

Phylogenetic trees elucidate evolutionary relationships among species, but phylogenetic inference remains challenging due to the complexity of combining continuous (branch lengths) and discrete parameters (tree topology).

branch length, topology, tree topology, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
(2 more...)