Well File:

MMGP: a Mesh Morphing Gaussian Process-based machine learning method for regression of physical problems under non-parameterized geometrical variability

Neural Information Processing Systems

When learning simulations for modeling physical phenomena in industrial designs, geometrical variabilities are of prime interest. While classical regression techniques prove effective for parameterized geometries, practical scenarios often involve the absence of shape parametrization during the inference stage, leaving us with only mesh discretizations as available data. Learning simulations from such mesh-based representations poses significant challenges, with recent advances relying heavily on deep graph neural networks to overcome the limitations of conventional machine learning approaches.


Fearless Stochasticity in Expectation Propagation

Neural Information Processing Systems

Expectation propagation (EP) is a family of algorithms for performing approximate inference in probabilistic models. The updates of EP involve the evaluation of moments--expectations of certain functions--which can be estimated from Monte Carlo (MC) samples. However, the updates are not robust to MC noise when performed naively, and various prior works have attempted to address this issue in different ways. In this work, we provide a novel perspective on the moment-matching updates of EP; namely, that they perform natural-gradient-based optimisation of a variational objective. We use this insight to motivate two new EP variants, with updates that are particularly well-suited to MC estimation. They remain stable and are most sample-efficient when estimated with just a single sample. These new variants combine the benefits of their predecessors and address key weaknesses. In particular, they are easier to tune, offer an improved speed-accuracy trade-off, and do not rely on the use of debiasing estimators. We demonstrate their efficacy on a variety of probabilistic inference tasks.



Articulate your NeRF: Unsupervised articulated object modeling via conditional view synthesis

Neural Information Processing Systems

We propose a novel unsupervised method to learn pose and part-segmentation of articulated objects with rigid parts. Given two observations of an object in different articulation states, our method learns the geometry and appearance of object parts by fitting an implicit model on the first observation and renders the latter observation by distilling the part segmentation and articulation. Additionally, to tackle the challenging joint optimization of part segmentation and articulation, we propose a voxel grid based initialization strategy and a decoupled optimization procedure. Compared to the prior unsupervised work, our model obtains significantly better performance, generalizes to objects with arbitrary number of parts while it can be efficiently learned from few views only for the latter observation.


Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games

Neural Information Processing Systems

It is shown that, under mild technical assumptions and the introduction of the suboptimality gap, the independent NPG method with an oracle providing exact policy evaluation asymptotically reaches an ฯต-Nash Equilibrium (NE) within O(1/ฯต) iterations.


Latent Paraphrasing: Perturbation on Layers Improves Knowledge Injection in Language Models

Neural Information Processing Systems

As Large Language Models (LLMs) are increasingly deployed in specialized domains with continuously evolving knowledge, the need for timely and precise knowledge injection has become essential. Fine-tuning with paraphrased data is a common approach to enhance knowledge injection, yet it faces two significant challenges: high computational costs due to repetitive external model usage and limited sample diversity. To this end, we introduce LaPael, a latent-level paraphrasing method that applies input-dependent noise to early LLM layers. This approach enables diverse and semantically consistent augmentations directly within the model. Furthermore, it eliminates the recurring costs of paraphrase generation for each knowledge update. Our extensive experiments on question-answering benchmarks demonstrate that LaPael improves knowledge injection over standard fine-tuning and existing noise-based approaches. Additionally, combining LaPael with data-level paraphrasing further enhances performance.


Appendices

Neural Information Processing Systems

In Appendix A, we provide proofs of Proposition 1, Proposition 2, and Theorem 1 in the main text. In Appendix B, we provide more details of the Bayesian variable selection (BVS) and stochastic block model (SBM) in Section 4 as well as a detailed simulation study on the spatial clustering model (SCM). In addition, we study the performance of multiple-try Metropolis for the case with multimodal target distributions, following the BVS simulation setting of [54]. In Appendix D, we add a more detailed discussion on parallelization, state space of interest, and the behavior of MTM on continuous state space. Finally, we provide additional tables on the real data analysis results in Appendix E. A.1 Proof of Proposition 1 This section aims to provide a summary of the existing results on proving mixing time bound via path methods.


ALPINE: Unveiling The Planning Capability of Autoregressive Learning in Language Models

Neural Information Processing Systems

Planning is a crucial element of both human intelligence and contemporary large language models (LLMs). In this paper, we initiate a theoretical investigation into the emergence of planning capabilities in Transformer-based LLMs via their next-word prediction mechanisms. We model planning as a network path-finding task, where the objective is to generate a valid path from a specified source node to a designated target node. Our mathematical characterization shows that Transformer architectures can execute path-finding by embedding the adjacency and reachability matrices within their weights. Furthermore, our theoretical analysis of gradient-based learning dynamics reveals that LLMs can learn both the adjacency and a limited form of the reachability matrices. These theoretical insights are then validated through experiments, which demonstrate that Transformer architectures indeed learn the adjacency and an incomplete reachability matrices, consistent with our theoretical predictions. When applying our methodology to the real-world planning benchmark Blocksworld, our observations remain consistent. Additionally, our analyses uncover a fundamental limitation of current Transformer architectures in path-finding: these architectures cannot identify reachability relationships through transitivity, which leads to failures in generating paths when concatenation is required. These findings provide new insights into how the internal mechanisms of autoregressive learning facilitate intelligent planning and deepen our understanding of how future LLMs might achieve more advanced and general planning-and-reasoning capabilities across diverse applications.


Appendix to Weakly Coupled Deep Q-Networks

Neural Information Processing Systems

We prove part the first part of the proposition (weak duality) by induction. It is well-known that, by the value iteration algorithm's convergence, Q Consider a state s S and a feasible action a A(s). We use an induction proof. This can be established by shifting the origin of the coordinate system. We use the following lemma from [6] to bound the accumulated noise.