Goto

Collaborating Authors

 Search


Fast Extra Gradient Methods for Smooth Structured Nonconvex-Nonconcave Minimax Problems

Neural Information Processing Systems

Modern minimax problems, such as generative adversarial network and adversarial training, are often under a nonconvex-nonconcave setting, and developing an efficient method for such setting is of interest. Recently, two variants of the extragradient (EG) method are studied in that direction. First, a two-time-scale variant of the EG, named EG, was proposed under a smooth structured nonconvex-nonconcave setting, with a slow \mathcal{O}(1/k) rate on the squared gradient norm, where k denotes the number of iterations. Second, another variant of EG with an anchoring technique, named extra anchored gradient (EAG), was studied under a smooth convex-concave setting, yielding a fast \mathcal{O}(1/k 2) rate on the squared gradient norm. Built upon EG and EAG, this paper proposes a two-time-scale EG with anchoring, named fast extragradient (FEG), that has a fast \mathcal{O}(1/k 2) rate on the squared gradient norm for smooth structured nonconvex-nonconcave problems; the corresponding saddle-gradient operator satisfies the negative comonotonicity condition.


Reinforcement Learning Enhanced Explainer for Graph Neural Networks

Neural Information Processing Systems

Graph neural networks (GNNs) have recently emerged as revolutionary technologies for machine learning tasks on graphs. In GNNs, the graph structure is generally incorporated with node representation via the message passing scheme, making the explanation much more challenging. Given a trained GNN model, a GNN explainer aims to identify a most influential subgraph to interpret the prediction of an instance (e.g., a node or a graph), which is essentially a combinatorial optimization problem over graph. The existing works solve this problem by continuous relaxation or search-based heuristics. But they suffer from key issues such as violation of message passing and hand-crafted heuristics, leading to inferior interpretability.


Learning to Compare Nodes in Branch and Bound with Graph Neural Networks

Neural Information Processing Systems

Branch-and-bound approaches in integer programming require ordering portions of the space to explore next, a problem known as node comparison. We propose a new siamese graph neural network model to tackle this problem, where the nodes are represented as bipartite graphs with attributes. Similar to prior work, we train our model to imitate a diving oracle that plunges towards the optimal solution. We evaluate our method by solving the instances in a plain framework where the nodes are explored according to their rank. On three NP-hard benchmarks chosen to be particularly primal-difficult, our approach leads to faster solving and smaller branch- and-bound trees than the default ranking function of the open-source solver SCIP, as well as competing machine learning methods.


Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

Neural Information Processing Systems

We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting. We propose a model-based algorithm named UCBVI- \gamma, which is based on the \emph{optimism in the face of uncertainty principle} and the Bernstein-type bonus. We show that UCBVI- \gamma achieves an \tilde{O}\big({\sqrt{SAT}}/{(1-\gamma) {1.5}}\big) regret, where S is the number of states, A is the number of actions, \gamma is the discount factor and T is the number of steps. In addition, we construct a class of hard MDPs and show that for any algorithm, the expected regret is at least \tilde{\Omega}\big({\sqrt{SAT}}/{(1-\gamma) {1.5}}\big) . Our upper bound matches the minimax lower bound up to logarithmic factors, which suggests that UCBVI- \gamma is nearly minimax optimal for discounted MDPs.


Unsupervised Learning for Combinatorial Optimization with Principled Objective Relaxation

Neural Information Processing Systems

Using machine learning to solve combinatorial optimization (CO) problems is challenging, especially when the data is unlabeled. This work proposes an unsupervised learning framework for CO problems. Our framework follows the standard relaxation-plus-rounding approach and adopts neural networks to parameterize the relaxed solutions so that simple back-propagation can train them end-to-end. Our key contribution is the observation that if the relaxed objective satisfies entry-wise concavity, a low optimization loss guarantees the quality of the obtained integral solutions. This observation significantly generalizes the applicability of the previous framework inspired by Erdos' probabilistic method (Karalias & Loukas, 2020).


Tight Analysis of Extra-gradient and Optimistic Gradient Methods For Nonconvex Minimax Problems

Neural Information Processing Systems

Despite the established convergence theory of Optimistic Gradient Descent Ascent (OGDA) and Extragradient (EG) methods for the convex-concave minimax problems, little is known about the theoretical guarantees of these methods in nonconvex settings. To bridge this gap, for the first time, this paper establishes the convergence of OGDA and EG methods under the nonconvex-strongly-concave (NC-SC) and nonconvex-concave (NC-C) settings by providing a unified analysis through the lens of single-call extra-gradient methods. We further establish lower bounds on the convergence of GDA/OGDA/EG, shedding light on the tightness of our analysis. We also conduct experiments supporting our theoretical results. We believe our results will advance the theoretical understanding of OGDA and EG methods for solving complicated nonconvex minimax real-world problems, e.g., Generative Adversarial Networks (GANs) or robust neural networks training.


Efficient Non-Parametric Optimizer Search for Diverse Tasks

Neural Information Processing Systems

Efficient and automated design of optimizers plays a crucial role in full-stack AutoML systems. However, prior methods in optimizer search are often limited by their scalability, generability, or sample efficiency. With the goal of democratizing research and application of optimizer search, we present the first efficient, scalable and generalizable framework that can directly search on the tasks of interest. We first observe that optimizer updates are fundamentally mathematical expressions applied to the gradient. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the space of optimizers into a super-tree, where each path encodes an optimizer.


A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs

Neural Information Processing Systems

Combinatorial Optimization (CO) has been a long-standing challenging research topic featured by its NP-hard nature. Traditionally such problems are approximately solved with heuristic algorithms which are usually fast but may sacrifice the solution quality. Currently, machine learning for combinatorial optimization (MLCO) has become a trending research topic, but most existing MLCO methods treat CO as a single-level optimization by directly learning the end-to-end solutions, which are hard to scale up and mostly limited by the capacity of ML models given the high complexity of CO. In this paper, we propose a hybrid approach to combine the best of the two worlds, in which a bi-level framework is developed with an upper-level learning method to optimize the graph (e.g. Such a bi-level approach simplifies the learning on the original hard CO and can effectively mitigate the demand for model capacity.


Online Matching in Sparse Random Graphs: Non-Asymptotic Performances of Greedy Algorithm

Neural Information Processing Systems

Motivated by sequential budgeted allocation problems, we investigate online matching problems where connections between vertices are not i.i.d., but they have fixed degree distributions -- the so-called configuration model. We estimate the competitive ratio of the simplest algorithm, GREEDY, by approximating some relevant stochastic discrete processes by their continuous counterparts, that are solutions of an explicit system of partial differential equations. This technique gives precise bounds on the estimation errors, with arbitrarily high probability as the problem size increases. In particular, it allows the formal comparison between different configuration models. We also prove that, quite surprisingly, GREEDY can have better performance guarantees than RANKING, another celebrated algorithm for online matching that usually outperforms the former.


Minimax Optimal Rate for Parameter Estimation in Multivariate Deviated Models

Neural Information Processing Systems

The main challenges in deriving the convergence rate of the MLE mainly come from two issues: (1) The interaction between the function h_{0} and the density function f; (2) The deviated proportion \lambda {\ast} can go to the extreme points of [0,1] as the sample size tends to infinity. To address these challenges, we develop the \emph{distinguishability condition} to capture the linear independent relation between the function h_{0} and the density function f . We then provide comprehensive convergence rates of the MLE via the vanishing rate of \lambda {\ast} to zero as well as the distinguishability of two functions h_{0} and f .