Goto

Collaborating Authors

 Optimization


Reviews: A primal-dual method for conic constrained distributed optimization problems

Neural Information Processing Systems

The reviewer believes that the greatest contribution of this paper is the analysis done for the time varying case, where the authors transformed the problem into one that requires a projection step into the consensual set; which then allows for applying a multi-consensus steps, where an inexact update analysis for primal-dual algorithms (Thm. This results leads to a converging distributed PDA algorithm for time varying networks. That said, the reviewer has several concerns on the practicality of the results and the presentation of this paper. To get around with the issue of computing (16)/(17) over time varying networks, the authors have adopted a multi-consensus step method to in (18) such that q_k k {p/2} consensus steps are needed for each iteration k, i.e., the number of steps grows with iteration number. The reviewer wonders if it is a practical setup as this essentially requires the algorithm to be synchronous, i.e., every node starts iteration k at the same time and knowing that he/she has to do exactly q_k times of consensus.


Reviews: Bayesian Optimization with a Finite Budget: An Approximate Dynamic Programming Approach

Neural Information Processing Systems

I enjoyed this paper where an effort has been made to transfer a relevant formalism and an appropriate technique from the world of Operations Research and Dynamic Programming to Bayesian Optimization. While this was partly done in previous work here it it seems to go one step further, and I am not aware of publications where the Rollout was adapted to this precise problem. The results look promising, especially on the GP realizations, but I really felt the absence of comparison to other strategies recently proposed to adress this very issue; GLASSES of [5] seems a natural competitor here (and maybe also the MCTS of [13]). Also I was wondering if further improvements could be reachable at reasonable research investment regarding the (currently rather simple) base policies. As for the empirical comparisons on functions, the way the models are set appears a bit contrived, and the conclusions would have more weight with some experiments in more realistic conditions.


Reviews: The non-convex Burer-Monteiro approach works on smooth semidefinite programs

Neural Information Processing Systems

This paper considers a specific class of semi-definite programs (minimize linear objective subject to linear equally and positive semi-definiteness constraints). This convex optimization problem is replaced by a non convex one where the optimization variable is replaced by the outer product of a matrix and its transpose. The paper shows that under certain conditions (size of the factorization large enough. More specifically, the paper is very closely related to prior work by Burer and Monteiro. I think [1] proposed the change of variables X YY T, proposed also an BFGS augmented Lagrangian algorithm for solving the nonconvex problem, and showed experimentally that it reliably gives a global minimum if number of columns of Y is large enough.


Reviews: Maximizing Influence in an Ising Network: A Mean-Field Optimal Solution

Neural Information Processing Systems

The paper studies the influence maximization using the Ising network model. However, the motivation for using Ising network model for opinion dynamics is not well motivated. Why do we need to use the Ising model? It is less convincing by just saying that this model can capture the steady-state of the opinion dynamics. Some other model can also consider the steady state. For example, how does it compare with prior work such as [1].


Reviews: Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods

Neural Information Processing Systems

This paper analyzes the convergence of a non-convex loss-minimization problem for learning the parameters of a general graph-based ranking model, that is defined by a random walk conducted by weights of nodes and edges, which are in turn defined by random walks defined by nodes' and edge's features. The optimization problem can not be solved by existing optimization methods which require exact values of the objective function. The proposed approach hence operates in two level. At the first level, a linearly convergent method is used to estimate an approximation to the stationary distribution of Markov random walk. This approach is validated among others and the authors show the value of the loss function can be approximated with any given precision.


Reviews: Hardness of Online Sleeping Combinatorial Optimization Problems

Neural Information Processing Systems

The paper solves an open problem presented at COLT 2015 [Koolen et. It defines OSCO problems as online combinatorial optimization problems in sleeping setting in which in each trial a subset of actions become unavailable due to unavailability of a subset of elements. Basically, the paper shows that OSCO problems are as hard as PAC-Learning of DNF expressions -- which is a long standing open problem. In a nutshell, the hardness result is shown as follows: the paper reduces the problem of "Online Agnostic Learnign of Disjuction" into "Hard Instances" of OSCO problems with per-action regret via a proof that has similarities to a proof in [Kanade and Steinke, 2014]. Interestingly enough, because of the online-to-batch conversion argument in [Kanade and Steinke, 2014], the hardness results seem to be also true for a benign form of adversary namely stochastic availabilities and loss (i.e. they are drawn from an unknown but fixed joint distribution). After proving the general hardness results, the paper provides "Hard Instances" of various well-known OSCO problems to establish their hardness in particularv -- including Online Sabotaged Shortest Path.


Reviews: Stochastic Mirror Descent in Variationally Coherent Optimization Problems

Neural Information Processing Systems

Adding experimental results as they promised to R3 would be valuable as well 3) As R2 pointed out, the intuition behind the analysis is not always clear. Given the rather convincing answers in the rebuttal, I think the authors can easily improve this aspect in the revised version.


SLM: A Smoothed First-Order Lagrangian Method for Structured Constrained Nonconvex Optimization

Neural Information Processing Systems

Functional constrained optimization (FCO) has emerged as a powerful tool for solving various machine learning problems. However, with the rapid increase in applications of neural networks in recent years, it has become apparent that both the objective and constraints often involve nonconvex functions, which poses significant challenges in obtaining high-quality solutions. In this work, we focus on a class of nonconvex FCO problems with nonconvex constraints, where the two optimization variables are nonlinearly coupled in the inequality constraint. Leveraging the primal-dual optimization framework, we propose a smoothed first-order Lagrangian method (SLM) for solving this class of problems. We establish the theoretical convergence guarantees of SLM to the Karush-Kuhn-Tucker (KKT) solutions through quantifying dual error bounds.


LogSpecT: Feasible Graph Learning Model from Stationary Signals with Recovery Guarantees

Neural Information Processing Systems

Graph learning from signals is a core task in graph signal processing (GSP). A significant subclass of graph signals called the stationary graph signals that broadens the concept of stationarity of data defined on regular domains to signals on graphs is gaining increasing popularity in the GSP community. The most commonly used model to learn graphs from these stationary signals is SpecT, which forms the foundation for nearly all the subsequent, more advanced models. Despite its strengths, the practical formulation of the model, known as rSpecT, has been identified to be susceptible to the choice of hyperparameters. More critically, it may suffer from infeasibility as an optimization problem.


CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference

Neural Information Processing Systems

Deep neural network (DNN) inference based on secure 2-party computation (2PC) can offer cryptographically-secure privacy protection but suffers from orders of magnitude latency overhead due to enormous communication. Previous works heavily rely on a proxy metric of ReLU counts to approximate the communication overhead and focus on reducing the ReLUs to improve the communication efficiency. However, we observe these works achieve limited communication reduction for state-of-the-art (SOTA) 2PC protocols due to the ignorance of other linear and non-linear operations, which now contribute to the majority of communication. CoPriv features a new 2PC protocol for convolution based on Winograd transformation and develops DNN-aware optimization to significantly reduce the inference communication. CoPriv further develops a 2PC-aware network optimization algorithm that is compatible with the proposed protocol and simultaneously reduces the communication for all the linear and non-linear operations.