brownian bridge
Towards Continuous-Time Approximations for Stochastic Gradient Descent without Replacement
Gradient optimization algorithms using epochs, that is those based on stochastic gradient descent without replacement (SGDo), are predominantly used to train machine learning models in practice. However, the mathematical theory of SGDo and related algorithms remain underexplored compared to their "with replacement" and "one-pass" counterparts. In this article, we propose a stochastic, continuous-time approximation to SGDo with additive noise based on a Young differential equation driven by a stochastic process we call an "epoched Brownian motion". We show its usefulness by proving the almost sure convergence of the continuous-time approximation for strongly convex objectives and learning rate schedules of the form $u_t = \frac{1}{(1+t)^β}, β\in (0,1)$. Moreover, we compute an upper bound on the asymptotic rate of almost sure convergence, which is as good or better than previous results for SGDo.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany (0.04)
Sampling conditioned diffusions via Pathspace Projected Monte Carlo
We present an algorithm to sample stochastic differential equations conditioned on rather general constraints, including integral constraints, endpoint constraints, and stochastic integral constraints. The algorithm is a pathspace Metropolis-adjusted manifold sampling scheme, which samples stochastic paths on the submanifold of realizations that adhere to the conditioning constraint. We demonstrate the effectiveness of the algorithm by sampling a dynamical condensation phase transition, conditioning a random walk on a fixed Levy stochastic area, conditioning a stochastic nonlinear wave equation on high amplitude waves, and sampling a stochastic partial differential equation model of turbulent pipe flow conditioned on relaminarization events.
Winner-takes-all for Multivariate Probabilistic Time Series Forecasting
Cortés, Adrien, Rehm, Rémi, Letzelter, Victor
We introduce TimeMCL, a method leveraging the Multiple Choice Learning (MCL) paradigm to forecast multiple plausible time series futures. Our approach employs a neural network with multiple heads and utilizes the Winner-Takes-All (WTA) loss to promote diversity among predictions. MCL has recently gained attention due to its simplicity and ability to address ill-posed and ambiguous tasks. We propose an adaptation of this framework for time-series forecasting, presenting it as an efficient method to predict diverse futures, which we relate to its implicit quantization objective. We provide insights into our approach using synthetic data and evaluate it on real-world time series, demonstrating its promising performance at a light computational cost.
- South America > Paraguay > Asunción > Asunción (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (3 more...)
- Energy (0.69)
- Banking & Finance (0.68)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Machine-Learned Sampling of Conditioned Path Measures
Jiang, Qijia, Cohn-Gordon, Reuben
We propose algorithms for sampling from posterior path measures $P(C([0, T], \mathbb{R}^d))$ under a general prior process. This leverages ideas from (1) controlled equilibrium dynamics, which gradually transport between two path measures, and (2) optimization in $\infty$-dimensional probability space endowed with a Wasserstein metric, which can be used to evolve a density curve under the specified likelihood. The resulting algorithms are theoretically grounded and can be integrated seamlessly with neural networks for learning the target trajectory ensembles, without access to data.
Parabolic Continual Learning
Yang, Haoming, Hasan, Ali, Tarokh, Vahid
Regularizing continual learning techniques is important for anticipating algorithmic behavior under new realizations of data. We introduce a new approach to continual learning by imposing the properties of a parabolic partial differential equation (PDE) to regularize the expected behavior of the loss over time. This class of parabolic PDEs has a number of favorable properties that allow us to analyze the error incurred through forgetting and the error induced through generalization. Specifically, we do this through imposing boundary conditions where the boundary is given by a memory buffer. By using the memory buffer as a boundary, we can enforce long term dependencies by bounding the expected error by the boundary loss. Finally, we illustrate the empirical performance of the method on a series of continual learning tasks.
Elliptic Loss Regularization
Hasan, Ali, Yang, Haoming, Ng, Yuting, Tarokh, Vahid
Regularizing neural networks is important for anticipating model behavior in regions of the data space that are not well represented. In this work, we propose a regularization technique for enforcing a level of smoothness in the mapping between the data input space and the loss value. We specify the level of regularity by requiring that the loss of the network satisfies an elliptic operator over the data domain. To do this, we modify the usual empirical risk minimization objective such that we instead minimize a new objective that satisfies an elliptic operator over points within the domain. This allows us to use existing theory on elliptic operators to anticipate the behavior of the error for points outside the training set. We propose a tractable computational method that approximates the behavior of the elliptic operator while being computationally efficient. Finally, we analyze the properties of the proposed regularization to understand the performance on common problems of distribution shift and group imbalance. Numerical experiments confirm the utility of the proposed regularization technique.
- Europe > Norway > Eastern Norway > Oslo (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Health & Medicine (1.00)
- Leisure & Entertainment > Games (0.46)
An interpretation of the Brownian bridge as a physics-informed prior for the Poisson equation
Alberts, Alex, Bilionis, Ilias
Physics-informed machine learning is one of the most commonly used methods for fusing physical knowledge in the form of partial differential equations with experimental data. The idea is to construct a loss function where the physical laws take the place of a regularizer and minimize it to reconstruct the underlying physical fields and any missing parameters. However, there is a noticeable lack of a direct connection between physics-informed loss functions and an overarching Bayesian framework. In this work, we demonstrate that Brownian bridge Gaussian processes can be viewed as a softly-enforced physics-constrained prior for the Poisson equation. We first show equivalence between the variational form of the physics-informed loss function for the Poisson equation and a kernel ridge regression objective. Then, through the connection between Gaussian process regression and kernel methods, we identify a Gaussian process for which the posterior mean function and physics-informed loss function minimizer agree. This connection allows us to probe different theoretical questions, such as convergence and behavior of inverse problems. We also connect the method to the important problem of identifying model-form error in applications.
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
Infinite-Resolution Integral Noise Warping for Diffusion Models
Deng, Yitong, Lin, Winnie, Li, Lingxiao, Smirnov, Dmitriy, Burgert, Ryan, Yu, Ning, Dedun, Vincent, Taghavi, Mohammad H.
Adapting pretrained image-based diffusion models to generate temporally consistent videos has become an impactful generative modeling research direction. Training-free noise-space manipulation has proven to be an effective technique, where the challenge is to preserve the Gaussian white noise distribution while adding in temporal consistency. Recently, Chang et al. (2024) formulated this problem using an integral noise representation with distribution-preserving guarantees, and proposed an upsampling-based algorithm to compute it. However, while their mathematical formulation is advantageous, the algorithm incurs a high computational cost. Through analyzing the limiting-case behavior of their algorithm as the upsampling resolution goes to infinity, we develop an alternative algorithm that, by gathering increments of multiple Brownian bridges, achieves their infinite-resolution accuracy while simultaneously reducing the computational cost by orders of magnitude. We prove and experimentally validate our theoretical claims, and demonstrate our method's effectiveness in real-world applications. We further show that our method readily extends to the 3-dimensional space.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space
Feitelberg, Jacob, Choi, Kyuseong, Agarwal, Anish, Dwivedi, Raaz
We introduce the problem of distributional matrix completion: Given a sparsely observed matrix of empirical distributions, we seek to impute the true distributions associated with both observed and unobserved matrix entries. This is a generalization of traditional matrix completion where the observations per matrix entry are scalar valued. To do so, we utilize tools from optimal transport to generalize the nearest neighbors method to the distributional setting. Under a suitable latent factor model on probability distributions, we establish that our method recovers the distributions in the Wasserstein norm. We demonstrate through simulations that our method is able to (i) provide better distributional estimates for an entry compared to using observed samples for that entry alone, (ii) yield accurate estimates of distributional quantities such as standard deviation and value-at-risk, and (iii) inherently support heteroscedastic noise. We also prove novel asymptotic results for Wasserstein barycenters over one-dimensional distributions.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Indiana (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- (3 more...)
Thunder : Unified Regression-Diffusion Speech Enhancement with a Single Reverse Step using Brownian Bridge
Trachu, Thanapat, Piansaddhayanon, Chawan, Chuangsuwanich, Ekapol
Diffusion-based speech enhancement has shown promising results, but can suffer from a slower inference time. Initializing the diffusion process with the enhanced audio generated by a regression-based model can be used to reduce the computational steps required. However, these approaches often necessitate a regression model, further increasing the system's complexity. We propose Thunder, a unified regression-diffusion model that utilizes the Brownian bridge process which can allow the model to act in both modes. The regression mode can be accessed by setting the diffusion time step closed to 1. However, the standard score-based diffusion modeling does not perform well in this setup due to gradient instability. To mitigate this problem, we modify the diffusion model to predict the clean speech instead of the score function, achieving competitive performance with a more compact model size and fewer reverse steps.