Search
Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems
Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these vanilla GDA algorithms with constant stepsize can potentially diverge even in the convex setting. In this work, we show that for a subclass of nonconvex-nonconcave objectives satisfying a so-called two-sided Polyak-{\L}ojasiewicz inequality, the alternating gradient descent ascent (AGDA) algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate. We further develop a variance reduced algorithm that attains a provably faster rate than AGDA when the problem has the finite-sum structure.
USCO-Solver: Solving Undetermined Stochastic Combinatorial Optimization Problems
Real-world decision-making systems are often subject to uncertainties that have to be resolved through observational data. Therefore, we are frequently confronted with combinatorial optimization problems of which the objective function is unknown and thus has to be debunked using empirical evidence. In contrast to the common practice that relies on a learning-and-optimization strategy, we consider the regression between combinatorial spaces, aiming to infer high-quality optimization solutions from samples of input-solution pairs -- without the need to learn the objective function. Our main deliverable is a universal solver that is able to handle abstract undetermined stochastic combinatorial optimization problems. In empirical studies, we demonstrate our design using proof-of-concept experiments, and compare it with other methods that are potentially applicable. Overall, we obtain highly encouraging experimental results for several classic combinatorial problems on both synthetic and real-world datasets.
Efficient Algorithms for Smooth Minimax Optimization
This paper studies first order methods for solving smooth minimax optimization problems \min_x \max_y g(x,y) where g(\cdot,\cdot) is smooth and g(x,\cdot) is concave for each x . In terms of g(\cdot,y), we consider two settings -- strongly convex and nonconvex -- and improve upon the best known rates in both. For strongly-convex g(\cdot, y),\ \forall y, we propose a new direct optimal algorithm combining Mirror-Prox and Nesterov's AGD, and show that it can find global optimum in \widetilde{O}\left(1/k 2 \right) iterations, improving over current state-of-the-art rate of O(1/k) . We use this result along with an inexact proximal point method to provide \widetilde{O}\left(1/k {1/3} \right) rate for finding stationary points in the nonconvex setting where g(\cdot, y) can be nonconvex. This improves over current best-known rate of O(1/k {1/5}) .
Sample Selection for Fair and Robust Training
Fairness and robustness are critical elements of Trustworthy AI that need to be addressed together. Fairness is about learning an unbiased model while robustness is about learning from corrupted data, and it is known that addressing only one of them may have an adverse affect on the other. In this work, we propose a sample selection-based algorithm for fair and robust training. To this end, we formulate a combinatorial optimization problem for the unbiased selection of samples in the presence of data corruption. Observing that solving this optimization problem is strongly NP-hard, we propose a greedy algorithm that is efficient and effective in practice.
Subgoal Search For Complex Reasoning Tasks
Humans excel in solving complex reasoning tasks through a mental process of moving from one idea to a related one. Inspired by this, we propose Subgoal Search (kSubS) method. Its key component is a learned subgoal generator that produces a diversity of subgoals that are both achievable and closer to the solution. Using subgoals reduces the search space and induces a high-level search graph suitable for efficient planning. In this paper, we implement kSubS using a transformer-based subgoal module coupled with the classical best-first search framework.
PyGlove: Symbolic Programming for Automated Machine Learning
Neural networks are sensitive to hyper-parameter and architecture choices. Automated Machine Learning (AutoML) is a promising paradigm for automating these choices. Current ML software libraries, however, are quite limited in handling the dynamic interactions among the components of AutoML. For example, efficient NAS algorithms, such as ENAS and DARTS, typically require an implementation coupling between the search space and search algorithm, the two key components in AutoML. Furthermore, implementing a complex search flow, such as searching architectures within a loop of searching hardware configurations, is difficult.
Progressive Feature Interaction Search for Deep Sparse Network
Deep sparse networks (DSNs), of which the crux is exploring the high-order feature interactions, have become the state-of-the-art on the prediction task with high-sparsity features. However, these models suffer from low computation efficiency, including large model size and slow model inference, which largely limits these models' application value. In this work, we approach this problem with neural architecture search by automatically searching the critical component in DSNs, the feature-interaction layer. We propose a distilled search space to cover the desired architectures with fewer parameters. We then develop a progressive search algorithm for efficient search on the space and well capture the order-priority property in sparse prediction tasks.
Reviews: Learning Chordal Markov Networks via Branch and Bound
The authors present a branch and bound algorithm for learning Chordal Markov networks. The prior state of the art algorithm is a dynamic programming approach based on a recursive characterization of clique tress and storing in memory the scores of already-solved subproblems. The proposed algorithm uses a branch and bound algorithm to search for an optimal chordal Markov network. The algorithm first uses a dynamic programming algorithm to enumerate Bayesian network structures, which are later used as pruning bounds. A symmetry breaking technique is introduced to prune the search space.
Reviews: Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem
The paper studies the classic prediction with experts advice problem. There are a finite number k of experts and a finite number T of rounds. There is a player that makes sequential decisions for T rounds based on the advice of the k experts, and his goal is to minimize the maximum regret he can experience (minimax regret). Naturally, the optimal adversarial strategy is a key quantity to study here. This paper takes up the conjectured minimax optimal adversarial strategy called "Comb strategy" in the Gravin et al. paper and shows that it is indeed minimax optimal in the case of 3 experts.