Goto

Collaborating Authors

 optimal solution


Semidefinite Relaxations of the Gromov-Wasserstein Distance

Neural Information Processing Systems

The Gromov-Wasserstein (GW) distance is an extension of the optimal transport problem that allows one to match objects between incomparable spaces. At its core, the GW distance is specified as the solution of a non-convex quadratic program and is not known to be tractable to solve. In particular, existing solvers for the GW distance are only able to find locally optimal solutions. In this work, we propose a semi-definite programming (SDP) relaxation of the GW distance. The relaxation can be viewed as the Lagrangian dual of the GW distance augmented with constraints that relate to the linear and quadratic terms of transportation plans. In particular, our relaxation provides a tractable (polynomial-time) algorithm to compute globally optimal transportation plans (in some instances) together with an accompanying proof of global optimality. Our numerical experiments suggest that the proposed relaxation is strong in that it frequently computes the globally optimal solution. Our Python implementation is available at https://github.com/tbng/gwsdp.


Training Data Attribution via Approximate Unrolling, Wu Lin 2, Jonathan Lorraine 1,2,3

Neural Information Processing Systems

Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set. Methods based on implicit differentiation, such as influence functions, can be made computationally efficient, but fail to account for underspecification, the implicit bias of the optimization algorithm, or multi-stage training pipelines. By contrast, methods based on unrolling address these issues but face scalability challenges.


Flipping-based Policy for Chance-Constrained Markov Decision Processes

Neural Information Processing Systems

Safe reinforcement learning (RL) is a promising approach for many real-world decision-making problems where ensuring safety is a critical necessity. In safe RL research, while expected cumulative safety constraints (ECSCs) are typically the first choices, chance constraints are often more pragmatic for incorporating safety under uncertainties. This paper proposes a flipping-based policy for Chance-Constrained Markov Decision Processes (CCMDPs). The flipping-based policy selects the next action by tossing a potentially distorted coin between two action candidates. The probability of the flip and the two action candidates vary depending on the state.


Conformal Inverse Optimization

Neural Information Processing Systems

Inverse optimization has been increasingly used to estimate unknown parameters in an optimization model based on decision data. We show that such a point estimation is insufficient in a prescriptive setting where the estimated parameters are used to prescribe new decisions. The prescribed decisions may be of low-quality and misaligned with human intuition and thus are unlikely to be adopted. To tackle this challenge, we propose conformal inverse optimization, which seeks to learn an uncertainty set for the unknown parameters and then solve a robust optimization model to prescribe new decisions. Under mild assumptions, we show that our method enjoys provable guarantees on solution quality, as evaluated using both the ground-truth parameters and the decision maker's perception of the unknown parameters. Our method demonstrates strong empirical performance compared to classic inverse optimization.


Low-Rank Extragradient Method for Nonsmooth and Low-Rank Matrix Optimization Problems

Neural Information Processing Systems

Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning. While significant progress has been made in recent years in developing efficient methods for smooth low-rank optimization problems that avoid maintaining high-rank matrices and computing expensive high-rank SVDs, advances for nonsmooth problems have been slow paced. In this paper we consider standard convex relaxations for such problems. Mainly, we prove that under a natural generalized strict complementarity condition and under the relatively mild assumption that the nonsmooth objective can be written as a maximum of smooth functions, the extragradient method, when initialized with a "warm-start" point, converges to an optimal solution with rate O(1/t) while requiring only two low-rank SVDs per iteration. We give a precise trade-off between the rank of the SVDs required and the radius of the ball in which we need to initialize the method. We support our theoretical results with empirical experiments on several nonsmooth low-rank matrix recovery tasks, demonstrating that using simple initializations, the extragradient method produces exactly the same iterates when full-rank SVDs are replaced with SVDs of rank that matches the rank of the (low-rank) ground-truth matrix to be recovered.


Constrained Binary Decision Making

Neural Information Processing Systems

Binary statistical decision making involves choosing between two states based on statistical evidence. The optimal decision strategy is typically formulated through a constrained optimization problem, where both the objective and constraints are expressed as integrals involving two Lebesgue measurable functions, one of which represents the strategy being optimized. In this work, we present a comprehensive formulation of the binary decision making problem and provide a detailed characterization of the optimal solution. Our framework encompasses a wide range of well-known and recently proposed decision making problems as specific cases. We demonstrate how our generic approach can be used to derive the optimal decision strategies for these diverse instances. Our results offer a robust mathematical tool that simplifies the process of solving both existing and novel formulations of binary decision making problems which are in the core of many Machine Learning algorithms.


Parameterized Approximation Schemes for Fair-Range Clustering

Neural Information Processing Systems

Fair-range clustering extends classical clustering formulations by associating each data point with one or more demographic labels. It imposes lower and upper bound constraints on the number of facilities opened for each label, ensuring fair representation of all demographic groups by the selected facilities. In this paper we focus on the fair-range k-median and k-means problems in Euclidean spaces. We give (1 + ฮต)-approximation algorithms with fixed-parameter tractable running times for both problems, parameterized by the numbers of opened facilities and demographic labels. For Euclidean metrics, these are the first parameterized approximation schemes for the problems, improving upon the previously known O(1)-approximation ratios given by Thejaswi et al. (KDD 2022).


7fc63ff01769c4fa7d9279e97e307829-AuthorFeedback.pdf

Neural Information Processing Systems

To all reviewers, thank you very much for your thoughtful comments and suggestions. R#1: "...importance of similarity among the selected tasks..." In Theorem 1&2, similarity in the tasks can be described R#1: "...domain randomization, when enough samples are used, is a better alternative to meta-learning..." In many R#2: "...Theorems 1 and 2 are asymptotic...": Only the first sentence of each theorem is asymptotic, the rest (starting Hence, the theorems are NOT asymptotic. We will remove the asymptotic parts for clarity. R#2: 'Assumption 2... the per-task optimal models are centered around the corresponding optimal solutions.": This assumption can easily be dropped with the cost of including the distance as a term. R#3: '...trust region alone cannot justify why... TRPO fares better..." Thanks for this insightful comment.



Appendix

Neural Information Processing Systems

The appendix is organized as follows. In Appendix A, we first discuss the relationship of our work to prior arts. In Appendix B, we provide some preliminary tools for analyzing our manifold optimization problem. Based upon this, the proof of Theorem 1 and the proof of Theorem 2 are provided in Appendix C and Appendix D, respectively. Finally, our experimental setup as well as more experimental results are provided in Appendix E. Notations. Before we proceed, let us first introduce the notations that will be used throughout the appendix.