Bayesian Inference
Diffusion Bridge Mixture Transports, Schr\"odinger Bridge Problems and Generative Modeling
The dynamic Schr\"odinger bridge problem seeks a stochastic process that defines a transport between two target probability measures, while optimally satisfying the criteria of being closest, in terms of Kullback-Leibler divergence, to a reference process. We propose a novel sampling-based iterative algorithm, the iterated diffusion bridge mixture (IDBM) procedure, aimed at solving the dynamic Schr\"odinger bridge problem. The IDBM procedure exhibits the attractive property of realizing a valid transport between the target probability measures at each iteration. We perform an initial theoretical investigation of the IDBM procedure, establishing its convergence properties. The theoretical findings are complemented by numerical experiments illustrating the competitive performance of the IDBM procedure. Recent advancements in generative modeling employ the time-reversal of a diffusion process to define a generative process that approximately transports a simple distribution to the data distribution. As an alternative, we propose utilizing the first iteration of the IDBM procedure as an approximation-free method for realizing this transport. This approach offers greater flexibility in selecting the generative process dynamics and exhibits accelerated training and superior sample quality over larger discretization intervals. In terms of implementation, the necessary modifications are minimally intrusive, being limited to the training loss definition.
Exact Selective Inference with Randomization
Panigrahi, Snigdha, Fry, Kevin, Taylor, Jonathan
The polyhedral method by Lee et al. (2016) introduced confidence intervals for exact selective inference in Gaussian regression models. This method provides valid inferences for selected parameters by conditioning on the outcome of selection. A pivot is obtained for each selected parameter from a truncated Gaussian distribution, provided the outcome of selection can be described by linear constraints, also known as polyhedral constraints. However, as shown by Kivaranovic and Leeb (2021), confidence intervals based on this pivot can have infinite length in expectation. Randomizing data at the time of selection and conditioning on the outcome of randomized selection produces narrower confidence intervals than the polyhedral method.
Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior: From Theory to Practice
Rothfuss, Jonas, Josifoski, Martin, Fortuin, Vincent, Krause, Andreas
Meta-Learning aims to speed up the learning process on new tasks by acquiring useful inductive biases from datasets of related learning tasks. While, in practice, the number of related tasks available is often small, most of the existing approaches assume an abundance of tasks; making them unrealistic and prone to overfitting. A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks. In this work, we provide a theoretical analysis using the PAC-Bayesian theory and present a generalization bound for meta-learning, which was first derived by Rothfuss et al. (2021a). Crucially, the bound allows us to derive the closed form of the optimal hyper-posterior, referred to as PACOH, which leads to the best performance guarantees. We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds. The closed-form PACOH inspires a practical meta-learning approach that avoids the reliance on bi-level optimization, giving rise to a stochastic optimization problem that is amenable to standard variational methods that scale well. Our experiments show that, when instantiating the PACOH with Gaussian processes and Bayesian Neural Networks models, the resulting methods are more scalable, and yield state-of-the-art performance, both in terms of predictive accuracy and the quality of uncertainty estimates.
Minimizing low-rank models of high-order tensors: Hardness, span, tight relaxation, and applications
Sidiropoulos, Nicholas D., Karakasis, Paris, Konar, Aritra
We consider the problem of finding the smallest or largest entry of a tensor of order N that is specified via its rank decomposition. Stated in a different way, we are given N sets of R-dimensional vectors and we wish to select one vector from each set such that the sum of the Hadamard product of the selected vectors is minimized or maximized. We show that this fundamental tensor problem is NP-hard for any tensor rank higher than one, and polynomial-time solvable in the rank-one case. We also propose a continuous relaxation and prove that it is tight for any rank. For low-enough ranks, the proposed continuous reformulation is amenable to low-complexity gradient-based optimization, and we propose a suite of gradient-based optimization algorithms drawing from projected gradient descent, Frank-Wolfe, or explicit parametrization of the relaxed constraints. We also show that our core results remain valid no matter what kind of polyadic tensor model is used to represent the tensor of interest, including Tucker, HOSVD/MLSVD, tensor train, or tensor ring. Next, we consider the class of problems that can be posed as special instances of the problem of interest. We show that this class includes the partition problem (and thus all NP-complete problems via polynomial-time transformation), integer least squares, integer linear programming, integer quadratic programming, sign retrieval (a special kind of mixed integer programming / restricted version of phase retrieval), and maximum likelihood decoding of parity check codes. We demonstrate promising experimental results on a number of hard problems, including state-of-art performance in decoding low density parity check codes and general parity check codes.
Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian Score Climbing
Abdulsamad, Hany, Iqbal, Sahel, Corenflos, Adrien, Sรคrkkรค, Simo
Stochastic optimal control of dynamical systems is a crucial challenge in sequential decision-making. Recently, control-as-inference approaches have had considerable success, providing a viable risk-sensitive framework to address the exploration-exploitation dilemma. Nonetheless, a majority of these techniques only invoke the inference-control duality to derive a modified risk objective that is then addressed within a reinforcement learning framework. This paper introduces a novel perspective by framing risk-sensitive stochastic control as Markovian score climbing under samples drawn from a conditional particle filter. Our approach, while purely inference-centric, provides asymptotically unbiased estimates for gradient-based policy optimization with optimal importance weighting and no explicit value function learning. To validate our methodology, we apply it to the task of learning neural non-Gaussian feedback policies, showcasing its efficacy on numerical benchmarks of stochastic dynamical systems.
RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation
Xia, Yu, Arian, Ali, Narayanamoorthy, Sriram, Mabry, Joshua
Significant research effort has been devoted in recent years to developing personalized pricing, promotions, and product recommendation algorithms that can leverage rich customer data to learn and earn. Systematic benchmarking and evaluation of these causal learning systems remains a critical challenge, due to the lack of suitable datasets and simulation environments. In this work, we propose a multi-stage model for simulating customer shopping behavior that captures important sources of heterogeneity, including price sensitivity and past experiences. We embedded this model into a working simulation environment -- RetailSynth. RetailSynth was carefully calibrated on publicly available grocery data to create realistic synthetic shopping transactions. Multiple pricing policies were implemented within the simulator and analyzed for impact on revenue, category penetration, and customer retention. Applied researchers can use RetailSynth to validate causal demand models for multi-category retail and to incorporate realistic price sensitivity into emerging benchmarking suites for personalized pricing, promotions, and product recommendations.
nbi: the Astronomer's Package for Neural Posterior Estimation
Zhang, Keming, Bloom, Joshua S., van der Walt, Stรฉfan, Hernitschek, Nina
Despite the promise of Neural Posterior Estimation (NPE) methods in astronomy, the adaptation of NPE into the routine inference workflow has been slow. We identify three critical issues: the need for custom featurizer networks tailored to the observed data, the inference inexactness, and the under-specification of physical forward models. To address the first two issues, we introduce a new framework and open-source software nbi (Neural Bayesian Inference), which supports both amortized and sequential NPE. First, nbi provides built-in "featurizer" networks with demonstrated efficacy on sequential data, such as light curve and spectra, thus obviating the need for this customization on the user end. Second, we introduce a modified algorithm SNPE-IS, which facilities asymptotically exact inference by using the surrogate posterior under NPE only as a proposal distribution for importance sampling. These features allow nbi to be applied off-the-shelf to astronomical inference problems involving light curves and spectra. We discuss how nbi may serve as an effective alternative to existing methods such as Nested Sampling. Our package is at https://github.com/kmzzhang/nbi.
Contingency Games for Multi-Agent Interaction
Peters, Lasse, Bajcsy, Andrea, Chiu, Chih-Yuan, Fridovich-Keil, David, Laine, Forrest, Ferranti, Laura, Alonso-Mora, Javier
Contingency planning, wherein an agent generates a set of possible plans conditioned on the outcome of an uncertain event, is an increasingly popular way for robots to act under uncertainty. In this work we take a game-theoretic perspective on contingency planning, tailored to multi-agent scenarios in which a robot's actions impact the decisions of other agents and vice versa. The resulting contingency game allows the robot to efficiently interact with other agents by generating strategic motion plans conditioned on multiple possible intents for other actors in the scene. Contingency games are parameterized via a scalar variable which represents a future time when intent uncertainty will be resolved. By estimating this parameter online, we construct a game-theoretic motion planner that adapts to changing beliefs while anticipating future certainty. We show that existing variants of game-theoretic planning under uncertainty are readily obtained as special cases of contingency games. Through a series of simulated autonomous driving scenarios, we demonstrate that contingency games close the gap between certainty-equivalent games that commit to a single hypothesis and non-contingent multi-hypothesis games that do not account for future uncertainty reduction.
Learned reconstruction methods for inverse problems: sample error estimates
The mathematical treatment of inverse problems has proved to be a lively and attractive research field, driven and motivated by a wide variety of applications and by the theoretical challenges induced by their ill-posed nature. In order to provide more accurate and reliable strategies, especially for the reconstruction task, the scientific research in the field has shown a growing interest in the topic of learned reconstruction, or data-driven, methods, to combine classical, model-based approaches with valuable information of statistical nature. This has represented a natural outcome and development of the analysis of inverse problems, both on a numerical and on a theoretical side: indeed, the idea of leveraging prior knowledge on the solution has traditionally been considered to mitigate ill-posedness, as a regularization tool as much as a support for the reconstruction. We have now witnessed the emergence of several learning-based approaches to inverse problems, providing, in many cases, striking numerical results in terms of accuracy and efficiency. Moreover, significant interest has grown in the direction of theoretical guarantees for such techniques, spanning from the demand of interpretability and reliability, up to the issues of stability and convergence [8, 55]. Despite several distinct avenues have emerged, which are now well-established and are developing independently (to name a few: generative models, unrolled techniques, Plug and Play schemes), it is possible to provide a unifying overview of them from the point of view of statistical learning theory [20]. In this context, the goal pursued by this paper is twofold. On the one side, it aims to provide a general theoretical framework in statistical learning that is able to comprehend a large family of data-driven reconstruction methods.
Deep de Finetti: Recovering Topic Distributions from Large Language Models
Zhang, Liyi, McCoy, R. Thomas, Sumers, Theodore R., Zhu, Jian-Qiao, Griffiths, Thomas L.
Large language models (LLMs) can produce long, coherent passages of text, suggesting that LLMs, although trained on next-word prediction, must represent the latent structure that characterizes a document. Prior work has found that internal representations of LLMs encode one aspect of latent structure, namely syntax; here we investigate a complementary aspect, namely the document's topic structure. We motivate the hypothesis that LLMs capture topic structure by connecting LLM optimization to implicit Bayesian inference. De Finetti's theorem shows that exchangeable probability distributions can be represented as a mixture with respect to a latent generating distribution. Although text is not exchangeable at the level of syntax, exchangeability is a reasonable starting assumption for topic structure. We thus hypothesize that predicting the next token in text will lead LLMs to recover latent topic distributions. We examine this hypothesis using Latent Dirichlet Allocation (LDA), an exchangeable probabilistic topic model, as a target, and we show that the representations formed by LLMs encode both the topics used to generate synthetic data and those used to explain natural corpus data.