Goto

Collaborating Authors

 Mathematical & Statistical Methods


On Contrastive Representations of Stochastic Processes

Neural Information Processing Systems

Learning representations of stochastic processes is an emerging problem in machine learning with applications from meta-learning to physical object models to time series. Typical methods rely on exact reconstruction of observations, but this approach breaks down as observations become high-dimensional or noise distributions become complex. To address this, we propose a unifying framework for learning contrastive representations of stochastic processes (CReSP) that does away with exact reconstruction. We dissect potential use cases for stochastic process representations, and propose methods that accommodate each. Empirically, we show that our methods are effective for learning representations of periodic functions, 3D objects and dynamical processes.


Multi-Stage Predict+Optimize for (Mixed Integer) Linear Programs

Neural Information Processing Systems

The recently-proposed framework of Predict Optimize tackles optimization problems with parameters that are unknown at solving time, in a supervised learning setting. Prior frameworks consider only the scenario where all unknown parameters are (eventually) revealed simultaneously. In this work, we propose Multi-Stage Predict Optimize, a novel extension catering to applications where unknown parameters are revealed in sequential stages, with optimization decisions made in between. We further develop three training algorithms for neural networks (NNs) for our framework as proof of concept, both of which handle all mixed integer linear programs. The first baseline algorithm is a natural extension of prior work, training a single NN which makes a single prediction of unknown parameters.


MILP-StuDio: MILP Instance Generation via Block Structure Decomposition

Neural Information Processing Systems

Mixed-integer linear programming (MILP) is one of the most popular mathematical formulations with numerous applications. In practice, improving the performance of MILP solvers often requires a large amount of high-quality data, which can be challenging to collect. Researchers thus turn to generation techniques to generate additional MILP instances. However, existing approaches do not take into account specific block structures--which are closely related to the problem formulations--in the constraint coefficient matrices (CCMs) of MILPs. Consequently, they are prone to generate computationally trivial or infeasible instances due to the disruptions of block structures and thus problem formulations.


A Combinatorial Algorithm for the Semi-Discrete Optimal Transport Problem

Neural Information Processing Systems

Optimal Transport (OT, also known as the Wasserstein distance) is a popular metric for comparing probability distributions and has been successfully used in many machine-learning applications.In the semi-discrete 2 -Wasserstein problem, we wish to compute the cheapest way to transport all the mass from a continuous distribution \mu to a discrete distribution u in \mathbb{R} d for d\ge 1, where the cost of transporting unit mass between points a and b is d(a,b) a-b 2 . When both distributions are discrete, a simple combinatorial framework has been used to find the exact solution (see e.g. In this paper, we propose a combinatorial framework for the semi-discrete OT, which can be viewed as an extension of the combinatorial framework for the discrete OT but requires several new ideas. We present a new algorithm that given \mu and u in \mathbb{R} 2 and a parameter \varepsilon 0, computes an \varepsilon -additive approximate semi-discrete transport plan in O(n {4}\log n\log \frac{1}{\varepsilon}) time (in the worst case), where n is the support-size of the discrete distribution u and we assume that the mass of \mu inside a triangle can be computed in O(1) time. Our algorithm is significantly faster than the known algorithms, and unlike many numerical algorithms, it does not make any assumptions on the smoothness of \mu .As an application of our algorithm, we describe a data structure to store a large discrete distribution \mu (with support size N) using O(N) space so that, given a query discrete distribution u (with support size k), an \varepsilon -additive approximate transport plan can be computed in O(k {3}\sqrt{N}\log \frac{1}{\varepsilon}) time in 2 dimensions.


Globally Q-linear Gauss-Newton Method for Overparameterized Non-convex Matrix Sensing

Neural Information Processing Systems

This paper focuses on the optimization of overparameterized, non-convex low-rank matrix sensing (LRMS)--an essential component in contemporary statistics and machine learning. Recent years have witnessed significant breakthroughs in first-order methods, such as gradient descent, for tackling this non-convex optimization problem. However, the presence of numerous saddle points often prolongs the time required for gradient descent to overcome these obstacles. In this paper, we introduce an approximated Gauss-Newton (AGN) method for tackling the non-convex LRMS problem. Notably, AGN incurs a computational cost comparable to gradient descent per iteration but converges much faster without being slowed down by saddle points. We prove that, despite the non-convexity of the objective function, AGN achieves Q-linear convergence from random initialization to the global optimal solution.


Polytope Volume Monitoring Problem: Formulation and Solution via Parametric Linear Program Based Control Barrier Function

arXiv.org Artificial Intelligence

Motivated by the latest research on feasible space monitoring of multiple control barrier functions (CBFs) as well as polytopic collision avoidance, this paper studies the Polytope Volume Monitoring (PVM) problem, whose goal is to design a control law for inputs of nonlinear systems to prevent the volume of some state-dependent polytope from decreasing to zero. Recent studies have explored the idea of applying Chebyshev ball method in optimization theory to solve the case study of PVM; however, the underlying difficulties caused by nonsmoothness have not been addressed. This paper continues the study on this topic, where our main contribution is to establish the relationship between nonsmooth CBF and parametric optimization theory through directional derivatives for the first time, so as to solve PVM problems more conveniently. In detail, inspired by Chebyshev ball approach, a parametric linear program (PLP) based nonsmooth barrier function candidate is established for PVM, and then, sufficient conditions for it to be a nonsmooth CBF are proposed, based on which a quadratic program (QP) based safety filter with guaranteed feasibility is proposed to address PVM problems. Finally, a numerical simulation example is given to show the efficiency of the proposed safety filter.


Dissecting the Impact of Model Misspecification in Data-driven Optimization

arXiv.org Artificial Intelligence

Data-driven optimization aims to translate a machine learning model into decision-making by optimizing decisions on estimated costs. Such a pipeline can be conducted by fitting a distributional model which is then plugged into the target optimization problem. While this fitting can utilize traditional methods such as maximum likelihood, a more recent approach uses estimation-optimization integration that minimizes decision error instead of estimation error. Although intuitive, the statistical benefit of the latter approach is not well understood yet is important to guide the prescriptive usage of machine learning. In this paper, we dissect the performance comparisons between these approaches in terms of the amount of model misspecification. In particular, we show how the integrated approach offers a ``universal double benefit'' on the top two dominating terms of regret when the underlying model is misspecified, while the traditional approach can be advantageous when the model is nearly well-specified. Our comparison is powered by finite-sample tail regret bounds that are derived via new higher-order expansions of regrets and the leveraging of a recent Berry-Esseen theorem.


Neural-Augmented Incremental Nonlinear Dynamic Inversion for Quadrotors with Payload Adaptation

arXiv.org Artificial Intelligence

The increasing complexity of multirotor applications has led to the need of more accurate flight controllers that can reliably predict all forces acting on the robot. Traditional flight controllers model a large part of the forces but do not take so called residual forces into account. A reason for this is that accurately computing the residual forces can be computationally expensive. Incremental Nonlinear Dynamic Inversion (INDI) is a method that computes the difference between different sensor measurements in order to estimate these residual forces. The main issue with INDI is it's reliance on special sensor measurements which can be very noisy. Recent work has also shown that residual forces can be predicted using learning-based methods. In this work, we demonstrate that a learning algorithm can predict a smoother version of INDI outputs without requiring additional sensor measurements. In addition, we introduce a new method that combines learning based predictions with INDI. We also adapt the two approaches to work on quadrotors carrying a slung-type payload. The results show that using a neural network to predict residual forces can outperform INDI while using the combination of neural network and INDI can yield even better results than each method individually.


Flexible and Efficient Probabilistic PDE Solvers through Gaussian Markov Random Fields

arXiv.org Artificial Intelligence

Mechanistic knowledge about the physical world is virtually always expressed via partial differential equations (PDEs). Recently, there has been a surge of interest in probabilistic PDE solvers -- Bayesian statistical models mostly based on Gaussian process (GP) priors which seamlessly combine empirical measurements and mechanistic knowledge. As such, they quantify uncertainties arising from e.g. noisy or missing data, unknown PDE parameters or discretization error by design. Prior work has established connections to classical PDE solvers and provided solid theoretical guarantees. However, scaling such methods to large-scale problems remains a fundamental challenge primarily due to dense covariance matrices. Our approach addresses the scalability issues by leveraging the Markov property of many commonly used GP priors. It has been shown that such priors are solutions to stochastic PDEs (SPDEs) which when discretized allow for highly efficient GP regression through sparse linear algebra. In this work, we show how to leverage this prior class to make probabilistic PDE solvers practical, even for large-scale nonlinear PDEs, through greatly accelerated inference mechanisms. Additionally, our approach also allows for flexible and physically meaningful priors beyond what can be modeled with covariance functions. Experiments confirm substantial speedups and accelerated convergence of our physics-informed priors in nonlinear settings.


Optimizing Ride-Pooling Operations with Extended Pickup and Drop-Off Flexibility

arXiv.org Artificial Intelligence

The Ride-Pool Matching Problem (RMP) is central to on-demand ride-pooling services, where vehicles must be matched with multiple requests while adhering to service constraints such as pickup delays, detour limits, and vehicle capacity. Most existing RMP solutions assume passengers are picked up and dropped off at their original locations, neglecting the potential for passengers to walk to nearby spots to meet vehicles. This assumption restricts the optimization potential in ride-pooling operations. In this paper, we propose a novel matching method that incorporates extended pickup and drop-off areas for passengers. We first design a tree-based approach to efficiently generate feasible matches between passengers and vehicles. Next, we optimize vehicle routes to cover all designated pickup and drop-off locations while minimizing total travel distance. Finally, we employ dynamic assignment strategies to achieve optimal matching outcomes. Experiments on city-scale taxi datasets demonstrate that our method improves the number of served requests by up to 13\% and average travel distance by up to 21\% compared to leading existing solutions, underscoring the potential of leveraging passenger mobility to significantly enhance ride-pooling service efficiency.