Mathematical & Statistical Methods
Sharp Analysis of Stochastic Optimization under Global Kurdyka-ลojasiewicz Inequality
We study the complexity of finding the global solution to stochastic nonconvex optimization when the objective function satisfies global Kurdyka-ลojasiewicz (Kล) inequality and the queries from stochastic gradient oracles satisfy mild expected smoothness assumption. We first introduce a general framework to analyze Stochastic Gradient Descent (SGD) and its associated nonlinear dynamics under the setting.
Distributed Least Squares in Small Space via Sketching and Bias Reduction
Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show that these limitations can be circumvented in the distributed setting by designing sketching methods that minimize the bias of the estimator, rather than its error. In particular, we give a sparse sketching method running in optimal space and current matrix multiplication time, which recovers a nearly-unbiased least squares estimator using two passes over the data. This leads to new communication-efficient distributed averaging algorithms for least squares and related tasks, which directly improve on several prior approaches. Our key novelty is a new bias analysis for sketched least squares, giving a sharp characterization of its dependence on the sketch sparsity. The techniques include new higher-moment restricted Bai-Silverstein inequalities, which are of independent interest to the non-asymptotic analysis of deterministic equivalents for random matrices that arise from sketching.
Accelerating ERM for data-driven algorithm design using output-sensitive techniques Christopher Seiler
Data-driven algorithm design is a promising, learning-based approach for beyond worst-case analysis of algorithms with tunable parameters. An important open problem is the design of computationally efficient data-driven algorithms for combinatorial algorithm families with multiple parameters. As one fixes the problem instance and varies the parameters, the "dual" loss function typically has a piecewise-decomposable structure, i.e. is well-behaved except at certain sharp transition boundaries. Motivated by prior empirical work, we initiate the study of techniques to develop efficient ERM learning algorithms for data-driven algorithm design by enumerating the pieces of the sum dual loss functions for a collection of problem instances. The running time of our approach scales with the actual number of pieces that appear as opposed to worst case upper bounds on the number of pieces. Our approach involves two novel ingredients - an output-sensitive algorithm for enumerating polytopes induced by a set of hyperplanes using tools from computational geometry, and an execution graph which compactly represents all the states the algorithm could attain for all possible parameter values. We illustrate our techniques by giving algorithms for pricing problems, linkage-based clustering and dynamic-programming based sequence alignment.
Rates of Estimation of Optimal Transport Maps using Plug-in Estimators via Barycentric Projections Nabarun Deb Promit Ghosal Bodhisattva Sen Columbia University MIT Columbia University
In practice, these maps need to be estimated from data sampled according to ยต and ฮฝ. Plugin estimators are perhaps most popular in estimating transport maps in the field of computational optimal transport. In this paper, we provide a comprehensive analysis of the rates of convergences for general plug-in estimators defined via barycentric projections. Our main contribution is a new stability estimate for barycentric projections which proceeds under minimal smoothness assumptions and can be used to analyze general plug-in estimators. We illustrate the usefulness of this stability estimate by first providing rates of convergence for the natural discretediscrete and semi-discrete estimators of optimal transport maps.
Log-concave Sampling from a Convex Body with a Barrier: a Robust and Unified Dikin Walk
We consider the problem of sampling from a d-dimensional log-concave distribution ฯ(ฮธ) exp( f(ฮธ)) for L-Lipschitz f, constrained to a convex body with an efficiently computable self-concordant barrier function, contained in a ball of radius R with a w-warm start. We propose a robust sampling framework that computes spectral approximations to the Hessian of the barrier functions in each iteration.
On Contrastive Representations of Stochastic Processes Emile Mathieu, Adam Foster
Learning representations of stochastic processes is an emerging problem in machine learning with applications from meta-learning to physical object models to time series. Typical methods rely on exact reconstruction of observations, but this approach breaks down as observations become high-dimensional or noise distributions become complex.