Gradient Descent
EXOTIC: An Exact, Optimistic, Tree-Based Algorithm for Min-Max Optimization
Maheshwari, Chinmay, Pimpalkhare, Chinmay, Chatterjee, Debasish
Min-max optimization arises in many domains such as game theory, adversarial machine learning, etc., with gradient-based methods as a typical computational tool. Beyond convex-concave min-max optimization, the solutions found by gradient-based methods may be arbitrarily far from global optima. In this work, we present an algorithmic apparatus for computing globally optimal solutions in convex-non-concave and non-convex-concave min-max optimization. For former, we employ a reformulation that transforms it into a non-concave-convex max-min optimization problem with suitably defined feasible sets and objective function. The new form can be viewed as a generalization of Sion's minimax theorem. Next, we introduce EXOTIC-an Exact, Optimistic, Tree-based algorithm for solving the reformulated max-min problem. EXOTIC employs an iterative convex optimization solver to (approximately) solve the inner minimization and a hierarchical tree search for the outer maximization to optimistically select promising regions to search based on the approximate solution returned by convex optimization solver. We establish an upper bound on its optimality gap as a function of the number of calls to the inner solver, the solver's convergence rate, and additional problem-dependent parameters. Both our algorithmic apparatus along with its accompanying theoretical analysis can also be applied for non-convex-concave min-max optimization. In addition, we propose a class of benchmark convex-non-concave min-max problems along with their analytical global solutions, providing a testbed for evaluating algorithms for min-max optimization. Empirically, EXOTIC outperforms gradient-based methods on this benchmark as well as on existing numerical benchmark problems from the literature. Finally, we demonstrate the utility of EXOTIC by computing security strategies in multi-player games with three or more players.
A Experiment Details
Given the differences between the training procedures of the model presented in Section 6.2, and those All models in Section 6.3 were trained with stochastic gradient descent on batches of size All models presented in this paper make use of the same 3-Layer MLP for parameterizing the encoders and decoders. This is then divided into 18 capsules, each of 18 dimensions. The decoder layers then have output sizes (450, 675, 4096). For all topographic models (TV AE and BubbleV AE) in Section 6.3, the global topographic organization afforded by These values were chosen to be sufficiently large to achieve notably lower equivariance error than the V AE baseline, and thus demonstrate the impact of topographic organization without temporal coherence. The results of all models are shown in Section B below.