Conditional score-based diffusion models for Bayesian inference in infinite dimensions
Since their initial introduction, score-based diffusion models (SDMs) have been successfully applied to solve a variety of linear inverse problems in finite-dimensional vector spaces due to their ability to efficiently approximate the posterior distribution. However, using SDMs for inverse problems in infinite-dimensional function spaces has only been addressed recently, primarily through methods that learn the unconditional score. While this approach is advantageous for some inverse problems, it is mostly heuristic and involves numerous computationally costly forward operator evaluations during posterior sampling. To address these limitations, we propose a theoretically grounded method for sampling from the posterior of infinite-dimensional Bayesian linear inverse problems based on amortized conditional SDMs. In particular, we prove that one of the most successful approaches for estimating the conditional score in finite dimensions--the conditional denoising estimator--can also be applied in infinite dimensions. A significant part of our analysis is dedicated to demonstrating that extending infinite-dimensional SDMs to the conditional setting requires careful consideration, as the conditional score typically blows up for small times, contrarily to the unconditional score. We conclude by presenting stylized and large-scale numerical examples that validate our approach, offer additional insights, and demonstrate that our method enables large-scale, discretization-invariant Bayesian inference.
Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed Noise Thien Hang Nguyen Department of Computer Science Khoury College of Computer Sciences Boston University
In this work, we study the convergence in high probability of clipped gradient methods when the noise distribution has heavy tails, i.e., with bounded pth moments, for some 1 < p 2. Prior works in this setting follow the same recipe of using concentration inequalities and an inductive argument with union bound to bound the iterates across all iterations. This method results in an increase in the failure probability by a factor of T, where T is the number of iterations. We instead propose a new analysis approach based on bounding the moment generating function of a well chosen supermartingale sequence. We improve the dependency on T in the convergence guarantee for a wide range of algorithms with clipped gradients, including stochastic (accelerated) mirror descent for convex objectives and stochastic gradient descent for nonconvex objectives. Our high probability bounds achieve the optimal convergence rates and match the best currently known in-expectation bounds. Our approach naturally allows the algorithms to use time-varying step sizes and clipping parameters when the time horizon is unknown, which appears difficult or even impossible using existing techniques from prior works. Furthermore, we show that in the case of clipped stochastic mirror descent, several problem constants, including the initial distance to the optimum, are not required when setting step sizes and clipping parameters.
Assessing Social and Intersectional Biases in Contextualized Word Representations
Social bias in machine learning has drawn significant attention, with work ranging from demonstrations of bias in a multitude of applications, curating definitions of fairness for different contexts, to developing algorithms to mitigate bias. In natural language processing, gender bias has been shown to exist in context-free word embeddings. Recently, contextual word representations have outperformed word embeddings in several downstream NLP tasks. These word representations are conditioned on their context within a sentence, and can also be used to encode the entire sentence. In this paper, we analyze the extent to which state-of-the-art models for contextual word representations, such as BERT and GPT-2, encode biases with respect to gender, race, and intersectional identities. Towards this, we propose assessing bias at the contextual word level. This novel approach captures the contextual effects of bias missing in context-free word embeddings, yet avoids confounding effects that underestimate bias at the sentence encoding level. We demonstrate evidence of bias at the corpus level, find varying evidence of bias in embedding association tests, show in particular that racial bias is strongly encoded in contextual word models, and observe that bias effects for intersectional minorities are exacerbated beyond their constituent minority identities. Further, evaluating bias effects at the contextual word level captures biases that are not captured at the sentence level, confirming the need for our novel approach.
A Minimum-fuel cost (MF) derivation (t) A and 0 t T is H(s (t), a (t), p
The last two cases correspond to cost function gradients perpendicular to the faces of the feasible region. In linear optimization, these cases give is an infinite number of optimal solutions between two adjacent vertices. B.1 Baseline Algorithms We provide a brief discussion of the baseline algorithms below. The libraries our implementations are based off for PPO, SAC, and DreamerV2 are available under the MIT License, and the base MPO implementation under the Apache License 2.0. For MuJoCo [52], we used a Pro Lab license.
Adversarial Reprogramming Revisited
Adversarial reprogramming, introduced by Elsayed, Goodfellow, and Sohl-Dickstein, seeks to repurpose a neural network to perform a different task, by manipulating its input without modifying its weights. We prove that two-layer ReLU neural networks with random weights can be adversarially reprogrammed to achieve arbitrarily high accuracy on Bernoulli data models over hypercube vertices, provided the network width is no greater than its input dimension. We also substantially strengthen a recent result of Phuong and Lampert on directional convergence of gradient flow, and obtain as a corollary that training two-layer ReLU neural networks on orthogonally separable datasets can cause their adversarial reprogramming to fail. We support these theoretical results by experiments that demonstrate that, as long as batch normalisation layers are suitably initialised, even untrained networks with random weights are susceptible to adversarial reprogramming. This is in contrast to observations in several recent works that suggested that adversarial reprogramming is not possible for untrained networks to any degree of reliability.
Latent Neural Operator for Solving Forward and Inverse PDE Problems
Neural operators effectively solve PDE problems from data without knowing the explicit equations, which learn the map from the input sequences of observed samples to the predicted values. Most existing works build the model in the original geometric space, leading to high computational costs when the number of sample points is large. We present the Latent Neural Operator (LNO) solving PDEs in the latent space. In particular, we first propose Physics-Cross-Attention (PhCA) transforming representation from the geometric space to the latent space, then learn the operator in the latent space, and finally recover the real-world geometric space via the inverse PhCA map. Our model retains flexibility that can decode values in any position not limited to locations defined in the training set, and therefore can naturally perform interpolation and extrapolation tasks particularly useful for inverse problems. Moreover, the proposed LNO improves both prediction accuracy and computational efficiency. Experiments show that LNO reduces the GPU memory by 50%, speeds up training 1.8 times, and reaches state-of-the-art accuracy on four out of six benchmarks for forward problems and a benchmark for inverse problem.
Unpaired Image-to-Image Translation with Density Changing Regularization
Unpaired image-to-image translation aims to translate an input image to another domain such that the output image looks like an image from another domain while important semantic information are preserved. Inferring the optimal mapping with unpaired data is impossible without making any assumptions. In this paper, we make a density changing assumption where image patches of high probability density should be mapped to patches of high probability density in another domain. Then we propose an efficient way to enforce this assumption: we train the flows as density estimators and penalize the variance of density changes. Despite its simplicity, our method achieves the best performance on benchmark datasets and needs only 56 86% of training time of the existing state-of-the-art method. The training and evaluation code are avaliable at https://github.com/Mid-Push/