Goto

Collaborating Authors

 Mathematical & Statistical Methods


Aligning Embeddings and Geometric Random Graphs: Informational Results and Computational Approaches for the Procrustes-Wasserstein Problem

Neural Information Processing Systems

The Procrustes-Wasserstein problem consists in matching two high-dimensional point clouds in an unsupervised setting, and has many applications in natural language processing and computer vision. We consider a planted model with two datasets X,Y that consist of n datapoints in \mathbb{R} d, where Y is a noisy version of X, up to an orthogonal transformation and a relabeling of the data points. This setting is related to the graph alignment problem in geometric models.In this work, we focus on the euclidean transport cost between the point clouds as a measure of performance for the alignment. We first establish information-theoretic results, in the high ( d \gg \log n) and low ( d \ll \log n) dimensional regimes. We then study computational aspects and propose the'Ping-Pong algorithm', alternatively estimating the orthogonal transformation and the relabeling, initialized via a Franke-Wolfe convex relaxation.


On Contrastive Representations of Stochastic Processes

Neural Information Processing Systems

Learning representations of stochastic processes is an emerging problem in machine learning with applications from meta-learning to physical object models to time series. Typical methods rely on exact reconstruction of observations, but this approach breaks down as observations become high-dimensional or noise distributions become complex. To address this, we propose a unifying framework for learning contrastive representations of stochastic processes (CReSP) that does away with exact reconstruction. We dissect potential use cases for stochastic process representations, and propose methods that accommodate each. Empirically, we show that our methods are effective for learning representations of periodic functions, 3D objects and dynamical processes.


Multi-Stage Predict+Optimize for (Mixed Integer) Linear Programs

Neural Information Processing Systems

The recently-proposed framework of Predict Optimize tackles optimization problems with parameters that are unknown at solving time, in a supervised learning setting. Prior frameworks consider only the scenario where all unknown parameters are (eventually) revealed simultaneously. In this work, we propose Multi-Stage Predict Optimize, a novel extension catering to applications where unknown parameters are revealed in sequential stages, with optimization decisions made in between. We further develop three training algorithms for neural networks (NNs) for our framework as proof of concept, both of which handle all mixed integer linear programs. The first baseline algorithm is a natural extension of prior work, training a single NN which makes a single prediction of unknown parameters.


CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra

Neural Information Processing Systems

Moreover, CoLA provides memory efficient automatic differentiation, low precision computation, and GPU acceleration in both JAX and PyTorch, while also accommodating new objects, operations, and rules in downstream packages via multiple dispatch. CoLA can accelerate many algebraic operations, while making it easy to prototype matrix structures and algorithms, providing an appealing drop-in tool for virtually any computational effort that requires linear algebra. We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning.


MILP-StuDio: MILP Instance Generation via Block Structure Decomposition

Neural Information Processing Systems

Mixed-integer linear programming (MILP) is one of the most popular mathematical formulations with numerous applications. In practice, improving the performance of MILP solvers often requires a large amount of high-quality data, which can be challenging to collect. Researchers thus turn to generation techniques to generate additional MILP instances. However, existing approaches do not take into account specific block structures--which are closely related to the problem formulations--in the constraint coefficient matrices (CCMs) of MILPs. Consequently, they are prone to generate computationally trivial or infeasible instances due to the disruptions of block structures and thus problem formulations.


Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows

Neural Information Processing Systems

Normalizing flows transform a simple base distribution into a complex target distribution and have proved to be powerful models for data generation and density estimation. In this work, we propose a novel type of normalizing flow driven by a differential deformation of the continuous-time Wiener process. As a result, we obtain a rich time series model whose observable process inherits many of the appealing properties of its base process, such as efficient computation of likelihoods and marginals. Furthermore, our continuous treatment provides a natural framework for irregular time series with an independent arrival process, including straightforward interpolation. We illustrate the desirable properties of the proposed model on popular stochastic processes and demonstrate its superior flexibility to variational RNN and latent ODE baselines in a series of experiments on synthetic and real-world data.


A Combinatorial Algorithm for the Semi-Discrete Optimal Transport Problem

Neural Information Processing Systems

Optimal Transport (OT, also known as the Wasserstein distance) is a popular metric for comparing probability distributions and has been successfully used in many machine-learning applications.In the semi-discrete 2 -Wasserstein problem, we wish to compute the cheapest way to transport all the mass from a continuous distribution \mu to a discrete distribution u in \mathbb{R} d for d\ge 1, where the cost of transporting unit mass between points a and b is d(a,b) a-b 2 . When both distributions are discrete, a simple combinatorial framework has been used to find the exact solution (see e.g. In this paper, we propose a combinatorial framework for the semi-discrete OT, which can be viewed as an extension of the combinatorial framework for the discrete OT but requires several new ideas. We present a new algorithm that given \mu and u in \mathbb{R} 2 and a parameter \varepsilon 0, computes an \varepsilon -additive approximate semi-discrete transport plan in O(n {4}\log n\log \frac{1}{\varepsilon}) time (in the worst case), where n is the support-size of the discrete distribution u and we assume that the mass of \mu inside a triangle can be computed in O(1) time. Our algorithm is significantly faster than the known algorithms, and unlike many numerical algorithms, it does not make any assumptions on the smoothness of \mu .As an application of our algorithm, we describe a data structure to store a large discrete distribution \mu (with support size N) using O(N) space so that, given a query discrete distribution u (with support size k), an \varepsilon -additive approximate transport plan can be computed in O(k {3}\sqrt{N}\log \frac{1}{\varepsilon}) time in 2 dimensions.


Globally Q-linear Gauss-Newton Method for Overparameterized Non-convex Matrix Sensing

Neural Information Processing Systems

This paper focuses on the optimization of overparameterized, non-convex low-rank matrix sensing (LRMS)--an essential component in contemporary statistics and machine learning. Recent years have witnessed significant breakthroughs in first-order methods, such as gradient descent, for tackling this non-convex optimization problem. However, the presence of numerous saddle points often prolongs the time required for gradient descent to overcome these obstacles. In this paper, we introduce an approximated Gauss-Newton (AGN) method for tackling the non-convex LRMS problem. Notably, AGN incurs a computational cost comparable to gradient descent per iteration but converges much faster without being slowed down by saddle points. We prove that, despite the non-convexity of the objective function, AGN achieves Q-linear convergence from random initialization to the global optimal solution.


New Tight Bounds for SGD without Variance Assumption: A Computer-Aided Lyapunov Analysis

arXiv.org Machine Learning

The analysis of Stochastic Gradient Descent (SGD) often relies on making some assumption on the variance of the stochastic gradients, which is usually not satisfied or difficult to verify in practice. This paper contributes to a recent line of works which attempt to provide guarantees without making any variance assumption, leveraging only the (strong) convexity and smoothness of the loss functions. In this context, we prove new theoretical bounds derived from the monotonicity of a simple Lyapunov energy, improving the current state-of-the-art and extending their validity to larger step-sizes. Our theoretical analysis is backed by a Performance Estimation Problem analysis, which allows us to claim that, empirically, the bias term in our bounds is tight within our framework.


CT-OT Flow: Estimating Continuous-Time Dynamics from Discrete Temporal Snapshots

arXiv.org Machine Learning

In many real-world scenarios, such as single-cell RNA sequencing, data are observed only as discrete-time snapshots spanning finite time intervals and subject to noisy timestamps, with no continuous trajectories available. Recovering the underlying continuous-time dynamics from these snapshots with coarse and noisy observation times is a critical and challenging task. We propose Continuous-Time Optimal Transport Flow (CT-OT Flow), which first infers high-resolution time labels via partial optimal transport and then reconstructs a continuous-time data distribution through a temporal kernel smoothing. This reconstruction enables accurate training of dynamics models such as ODEs and SDEs. CT-OT Flow consistently outperforms state-of-the-art methods on synthetic benchmarks and achieves lower reconstruction errors on real scRNA-seq and typhoon-track datasets. Our results highlight the benefits of explicitly modeling temporal discretization and timestamp uncertainty, offering an accurate and general framework for bridging discrete snapshots and continuous-time processes.