Goto

Collaborating Authors

 Mathematical & Statistical Methods


Bounded-Regret MPC via Perturbation Analysis: Prediction Error, Constraints, and Nonlinearity

Neural Information Processing Systems

We study Model Predictive Control (MPC) and propose a general analysis pipeline to bound its dynamic regret. The pipeline first requires deriving a perturbation bound for a finite-time optimal control problem. Then, the perturbation bound is used to bound the per-step error of MPC, which leads to a bound on the dynamic regret. Thus, our pipeline reduces the study of MPC to the well-studied problem of perturbation analysis, enabling the derivation of regret bounds of MPC under a variety of settings. To demonstrate the power of our pipeline, we use it to generalize existing regret bounds on MPC in linear time-varying (LTV) systems to incorporate prediction errors on costs, dynamics, and disturbances. Further, our pipeline leads to regret bounds on MPC in systems with nonlinear dynamics and constraints.


Bounded-Regret MPC via Perturbation Analysis: Prediction Error, Constraints, and Nonlinearity

Neural Information Processing Systems

We study Model Predictive Control (MPC) and propose a general analysis pipeline to bound its dynamic regret. The pipeline first requires deriving a perturbation bound for a finite-time optimal control problem. Then, the perturbation bound is used to bound the per-step error of MPC, which leads to a bound on the dynamic regret. Thus, our pipeline reduces the study of MPC to the well-studied problem of perturbation analysis, enabling the derivation of regret bounds of MPC under a variety of settings. To demonstrate the power of our pipeline, we use it to generalize existing regret bounds on MPC in linear time-varying (LTV) systems to incorporate prediction errors on costs, dynamics, and disturbances. Further, our pipeline leads to regret bounds on MPC in systems with nonlinear dynamics and constraints.


Residual2Vec: Debiasing graph embedding with random graphs

Neural Information Processing Systems

Many graph embedding methods hinge on a sampling of context nodes based on random walks. However, random walks can be a biased sampler due to the structural properties of graphs. Most notably, random walks are biased by the degree of each node, where a node is sampled proportionally to its degree. The implication of such biases has not been clear, particularly in the context of graph representation learning. Here, we investigate the impact of the random walks' bias on graph embedding and propose residual2vec, a general graph embedding method that can debias various structural biases in graphs by using random graphs. We demonstrate that this debiasing not only improves link prediction and clustering performance but also allows us to explicitly model salient structural properties in graph embedding.


Faster Linear Algebra for Distance Matrices

Neural Information Processing Systems

The distance matrix of a dataset X of n points with respect to a distance function f represents all pairwise distances between points in X induced by f. Due to their wide applicability, distance matrices and related families of matrices have been the focus of many recent algorithmic works. We continue this line of research and take a broad view of algorithm design for distance matrices with the goal of designing fast algorithms, which are specifically tailored for distance matrices, for fundamental linear algebraic primitives.


Efficient and Effective Optimal Transport-Based Biclustering: Supplementary Material

Neural Information Processing Systems

For w, v, r and c containing no zeros, the resulting optimal coupling matrices Z and W are always an h-almost hard clustering with h {0,..., k 1}. W) represents a hard clustering Z Γ(n, n) (resp. The Kantorovich OT problem is a bounded linear program since Π(w, v) is a polytope i.e. a bounded polyhedron. The fundamental theorem of linear programming states that if the feasible set is non-empty then the solution lies in the extremity of the feasible region. This means that a solution Z to the OT problem is an extreme point of Π(w, v).


Rate-Optimal Subspace Estimation on Random Graphs Zhixin Zhou 1, Fan Zhou 2, Ping Li

Neural Information Processing Systems

Let ˆM be obtained from the last step of the algorithm, then by [16, Theorem 2.1], A This will be proved as follows. Since ˆM and M has at most rank r, rank(ˆM M) 2r. Then the proof is complete by applying (19). Firstly, we will prove (7). The proof is an application of Fano's inequality.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

Summary of paper and review: In this paper, the authors consider stochastic-gradient algorithms, and using an importance/weighted sampling scheme, they show how it is possible to attain faster convergence rates in certain regimes. In particular, for strongly convex problems, the authors show how--if one knows Lipschitz constants of every term in a finite sum objective--it is possible to attain convergence rates that depend not on a squared norm of Lipschitz constants but on a 1-norm-like quantity, which is always smaller. The downside of this approach is that one must know these Lipschitz constants, and it is difficult (perhaps impossible) to apply the results to objectives that are not of the from f(x) \sum_{i 1} n f_i(x). I am also not convinced that I should care to use these algorithms; the lack of empirical insights leaves me wondering if this analysis matters. Detailed comments: The idea here is a simple enough idea, and makes sense.


Online Covariance Matrix Estimation in Sketched Newton Methods

arXiv.org Machine Learning

Given the ubiquity of streaming data, online algorithms have been widely used for parameter estimation, with second-order methods particularly standing out for their efficiency and robustness. In this paper, we study an online sketched Newton method that leverages a randomized sketching technique to perform an approximate Newton step in each iteration, thereby eliminating the computational bottleneck of second-order methods. While existing studies have established the asymptotic normality of sketched Newton methods, a consistent estimator of the limiting covariance matrix remains an open problem. We propose a fully online covariance matrix estimator that is constructed entirely from the Newton iterates and requires no matrix factorization. Compared to covariance estimators for first-order online methods, our estimator for second-order methods is batch-free. We establish the consistency and convergence rate of our estimator, and coupled with asymptotic normality results, we can then perform online statistical inference for the model parameters based on sketched Newton methods. We also discuss the extension of our estimator to constrained problems, and demonstrate its superior performance on regression problems as well as benchmark problems in the CUTEst set.


Federated Sinkhorn

arXiv.org Artificial Intelligence

In this work we investigate the potential of solving the discrete Optimal Transport (OT) problem with entropy regularization in a federated learning setting. Recall that the celebrated Sinkhorn algorithm transforms the classical OT linear program into strongly convex constrained optimization, facilitating first order methods for otherwise intractably large problems. A common contemporary setting that remains an open problem as far as the application of Sinkhorn is the presence of data spread across clients with distributed inter-communication, either due to clients whose privacy is a concern, or simply by necessity of processing and memory hardware limitations. In this work we investigate various natural procedures, which we refer to as Federated Sinkhorn, that handle distributed environments where data is partitioned across multiple clients. We formulate the problem as minimizing the transport cost with an entropy regularization term, subject to marginal constraints, where block components of the source and target distribution vectors are locally known to clients corresponding to each block. We consider both synchronous and asynchronous variants as well as all-to-all and server-client communication topology protocols. Each procedure allows clients to compute local operations on their data partition while periodically exchanging information with others. We provide theoretical guarantees on convergence for the different variants under different possible conditions. We empirically demonstrate the algorithms performance on synthetic datasets and a real-world financial risk assessment application. The investigation highlights the subtle tradeoffs associated with computation and communication time in different settings and how they depend on problem size and sparsity.


Rough Stochastic Pontryagin Maximum Principle and an Indirect Shooting Method

arXiv.org Artificial Intelligence

Stochastic optimal control problems typically involve a dynamical system described by a stochastic differential equation (SDE) dx t = b (t, x t, u t)dt + σ (t, x t) dB t, t [0, T], (1.1) in Stratonovich or Itˆ o form, where x t is the state of the system at time t, u t is the control input, b is the drift, σ is the diffusion, B is a Brownian motion, T is the final time, and consist of optimizing an objective E[null T 0 f ( t, x t, u t)dt + g (x T)] over a set of control input trajectories subject to state and control constraints. By now, a rich literature on stochastic optimal control is available, with optimality conditions characterized by the dynamic programming principle as Hamilton-Jacobi-Bellman (HJB) partial differential equations (PDEs) [6-8], and by the Pontryagin Maximum Principle (PMP) as forward-backward stochastic differential equations (FBSDEs) [8-11]. For problems with linear dynamics and linear-quadratic costs, both approaches lead to tractable solutions characterized by stochastic Riccati equations [7,12,13]. However, for general nonlinear problems, solving HJB-PDEs or FBSDEs remains computationally challenging for high-dimensional state spaces, despite recent progress [14-17]. In practice, an effective approach consists of optimizing over a class of solutions u θ t parameterized by finitely-many parameters θ R k [18,19] (see [20,21] for machine learning applications). However, restricting solutions to a finite-dimensional space may obscure the structure of solutions and lead to suboptimality.