Goto

Collaborating Authors

 Search


Faster Stochastic Algorithms for Minimax Optimization under Polyak-{\L}ojasiewicz Condition

Neural Information Processing Systems

This paper considers stochastic first-order algorithms for minimax optimization under Polyak-{\L}ojasiewicz (PL) conditions. We prove SPIDER-GDA could find an \epsilon -approximate solution within {\mathcal O}\left((n \sqrt{n}\,\kappa_x\kappa_y 2)\log (1/\epsilon)\right) stochastic first-order oracle (SFO) complexity, which is better than the state-of-the-art method whose SFO upper bound is {\mathcal O}\big((n n {2/3}\kappa_x\kappa_y 2)\log (1/\epsilon)\big), where \kappa_x\triangleq L/\mu_x and \kappa_y\triangleq L/\mu_y .For the ill-conditioned case, we provide an accelerated algorithm to reduce the computational cost further. Our ideas also can be applied to the more general setting that the objective function only satisfies PL condition for one variable. Numerical experiments validate the superiority of proposed methods.


Scalable Distributional Robustness in a Class of Non-Convex Optimization with Guarantees

Neural Information Processing Systems

Distributionally robust optimization (DRO) has shown a lot of promise in providing robustness in learning as well as sample-based optimization problems. We endeavor to provide DRO solutions for a class of sum of fractionals, non-convex optimization which is used for decision making in prominent areas such as facility location and security games. In contrast to previous work, we find it more tractable to optimize the equivalent variance regularized form of DRO rather than the minimax form. We transform the variance regularized form to a mixed-integer second-order cone program (MISOCP), which, while guaranteeing global optimality, does not scale enough to solve problems with real-world datasets. We further propose two abstraction approaches based on clustering and stratified sampling to increase scalability, which we then use for real-world datasets. Importantly, we provide global optimality guarantees for our approach and show experimentally that our solution quality is better than the locally optimal ones achieved by state-of-the-art gradient-based methods.


Online learning with dynamics: A minimax perspective

Neural Information Processing Systems

We consider the problem of online learning with dynamics, where a learner interacts with a stateful environment over multiple rounds. In each round of the interaction, the learner selects a policy to deploy and incurs a cost that depends on both the chosen policy and current state of the world. The state-evolution dynamics and the costs are allowed to be time-varying, in a possibly adversarial way. In this setting, we study the problem of minimizing policy regret and provide non-constructive upper bounds on the minimax rate for the problem. Our main results provide sufficient conditions for online learnability for this setup with corresponding rates.


Verification and search algorithms for causal DAGs

Neural Information Processing Systems

We study two problems related to recovering causal graphs from interventional data: (i) \textit{verification}, where the task is to check if a purported causal graph is correct, and (ii) \textit{search}, where the task is to recover the correct causal graph. For both, we wish to minimize the number of interventions performed. For the first problem, we give a characterization of a minimal sized set of atomic interventions that is necessary and sufficient to check the correctness of a claimed causal graph. Our characterization uses the notion of \textit{covered edges}, which enables us to obtain simple proofs and also easily reason about earlier known results. We also generalize our results to the settings of bounded size interventions and node-dependent interventional costs.


Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic Reparameterization

Neural Information Processing Systems

Optimizing expensive-to-evaluate black-box functions of discrete (and potentially continuous) design parameters is a ubiquitous problem in scientific and engineering applications. Bayesian optimization (BO) is a popular, sample-efficient method that leverages a probabilistic surrogate model and an acquisition function (AF) to select promising designs to evaluate. However, maximizing the AF over mixed or high-cardinality discrete search spaces is challenging standard gradient-based methods cannot be used directly or evaluating the AF at every point in the search space would be computationally prohibitive. To address this issue, we propose using probabilistic reparameterization (PR). Instead of directly optimizing the AF over the search space containing discrete parameters, we instead maximize the expectation of the AF over a probability distribution defined by continuous parameters.


Monte Carlo Tree Descent for Black-Box Optimization

Neural Information Processing Systems

The key to Black-Box Optimization is to efficiently search through input regions with potentially widely-varying numerical properties, to achieve low-regret descent and fast progress toward the optima. Monte Carlo Tree Search (MCTS) methods have recently been introduced to improve Bayesian optimization by computing better partitioning of the search space that balances exploration and exploitation. Extending this promising framework, we study how to further integrate sample-based descent for faster optimization. We design novel ways of expanding Monte Carlo search trees, with new descent methods at vertices that incorporate stochastic search and Gaussian Processes. We propose the corresponding rules for balancing progress and uncertainty, branch selection, tree expansion, and backpropagation.


Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective

Neural Information Processing Systems

Adversarial Training (AT) has become arguably the state-of-the-art algorithm for extracting robust features. However, researchers recently notice that AT suffers from severe robust overfitting problems, particularly after learning rate (LR) decay. In this paper, we explain this phenomenon by viewing adversarial training as a dynamic minimax game between the model trainer and the attacker. Specifically, we analyze how LR decay breaks the balance between the minimax game by empowering the trainer with a stronger memorization ability, and show such imbalance induces robust overfitting as a result of memorizing non-robust features. We validate this understanding with extensive experiments, and provide a holistic view of robust overfitting from the dynamics of both the two game players. This understanding further inspires us to alleviate robust overfitting by rebalancing the two players by either regularizing the trainer's capacity or improving the attack strength.


NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks

Neural Information Processing Systems

This makes the performance of NAS approaches in more diverse areas poorly understood. In this paper, we present NAS-Bench-360, a benchmark suite to evaluate methods on domains beyond those traditionally studied in architecture search, and use it to address the following question: do state-of-the-art NAS methods perform well on diverse tasks? To construct the benchmark, we curate ten tasks spanning a diverse array of application domains, dataset sizes, problem dimensionalities, and learning objectives. Each task is carefully chosen to interoperate with modern CNN-based search methods while possibly being far-afield from its original development domain. To speed up and reduce the cost of NAS research, for two of the tasks we release the precomputed performance of 15,625 architectures comprising a standard CNN search space.


Improved Regret Bounds for Bandit Combinatorial Optimization

Neural Information Processing Systems

In this paper, we aim to reveal the property, which makes the bandit combinatorial optimization hard. Recently, Cohen et al. \citep{cohen2017tight} obtained a lower bound \Omega(\sqrt{d k 3 T / \log T}) of the regret, where k is the maximum \ell_1 -norm of action vectors, and T is the number of rounds. This lower bound was achieved by considering a continuous strongly-correlated distribution of losses. Our main contribution is that we managed to improve this bound by \Omega( \sqrt{d k 3 T}) through applying a factor of \sqrt{\log T}, which can be done by means of strongly-correlated losses with \textit{binary} values. The bound derives better regret bounds for three specific examples of the bandit combinatorial optimization: the multitask bandit, the bandit ranking and the multiple-play bandit.


Minimax Optimal Fixed-Budget Best Arm Identification in Linear Bandits

Neural Information Processing Systems

We study the problem of best arm identification in linear bandits in the fixed-budget setting. We provide a theoretical analysis of the failure probability of OD-LinBAI. Instead of all the optimality gaps, the performance of OD-LinBAI depends only on the gaps of the top d arms, where d is the effective dimension of the linear bandit instance. Complementarily, we present a minimax lower bound for this problem. The upper and lower bounds show that OD-LinBAI is minimax optimal up to constant multiplicative factors in the exponent, which is a significant theoretical improvement over existing methods (e.g., BayesGap, Peace, LinearExploration and GSE), and settles the question of ascertaining the difficulty of learning the best arm in the fixed-budget setting.