Optimization
Cutting-edge scale-out technology from will take fintech and logistics to new level
Toshiba Corporation, the industry leader in solutions for large-scale optimization problems, today announced a scale-out technology that minimizes hardware limitations, an evolution of its optimization computer, the Simulation Bifurcation Machine (SBM), that supports continued increases in computing speed and scale. Toshiba expects the new SBM to be a game changer for real-world problems that require large-scale, high-speed and low-latency, such as simultaneous financial transactions involving large numbers of stock, and complex control of multiple robots. The research results were published in Nature Electronics on March 1. Speed and scale are keys to success in industrial sectors as different as finance, logistics, and communications, all of which have to deal with large numbers and make complex decisions in the shortest time possible. Aiming to bring higher efficiencies to these and other businesses, Toshiba has addressed combinatorial optimization problems by developing high-speed, high-accuracy algorithms and corresponding practical computer solutions.
Learning to Optimize: A Primer and A Benchmark
Chen, Tianlong, Chen, Xiaohan, Chen, Wuyang, Heaton, Howard, Liu, Jialin, Wang, Zhangyang, Yin, Wotao
Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods, aiming at reducing the laborious iterations of hand engineering. It automates the design of an optimization method based on its performance on a set of training problems. This data-driven procedure generates methods that can efficiently solve problems similar to those in the training. In sharp contrast, the typical and traditional designs of optimization methods are theory-driven, so they obtain performance guarantees over the classes of problems specified by the theory. The difference makes L2O suitable for repeatedly solving a certain type of optimization problems over a specific distribution of data, while it typically fails on out-of-distribution problems. The practicality of L2O depends on the type of target optimization, the chosen architecture of the method to learn, and the training procedure. This new paradigm has motivated a community of researchers to explore L2O and report their findings. This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization. We set up taxonomies, categorize existing works and research directions, present insights, and identify open challenges.
Promoting Fairness through Hyperparameter Optimization
Cruz, Andrรฉ F., Saleiro, Pedro, Belรฉm, Catarina, Soares, Carlos, Bizarro, Pedro
Considerable research effort has been guided towards algorithmic fairness but real-world adoption of bias reduction techniques is still scarce. Existing methods are either metric- or model-specific, require access to sensitive attributes at inference time, or carry high development and deployment costs. This work explores, in the context of a real-world fraud detection application, the unfairness that emerges from traditional ML model development, and how to mitigate it with a simple and easily deployed intervention: fairness-aware hyperparameter optimization (HO). We propose and evaluate fairness-aware variants of three popular HO algorithms: Fair Random Search, Fair TPE, and Fairband. Our method enables practitioners to adapt pre-existing business operations to accommodate fairness objectives in a frictionless way and with controllable fairness-accuracy trade-offs. Additionally, it can be coupled with existing bias reduction techniques to tune their hyperparameters. We validate our approach on a real-world bank account opening fraud use case, as well as on three datasets from the fairness literature. Results show that, without extra training cost, it is feasible to find models with 111% average fairness increase and just 6% decrease in predictive accuracy, when compared to standard fairness-blind HO.
Solving and Learning Nonlinear PDEs with Gaussian Processes
Chen, Yifan, Hosseini, Bamdad, Owhadi, Houman, Stuart, Andrew M
We introduce a simple, rigorous, and unified framework for solving nonlinear partial differential equations (PDEs), and for solving inverse problems (IPs) involving the identification of parameters in PDEs, using the framework of Gaussian processes. The proposed approach (1) provides a natural generalization of collocation kernel methods to nonlinear PDEs and IPs, (2) has guaranteed convergence with a path to compute error bounds in the PDE setting, and (3) inherits the state-of-the-art computational complexity of linear solvers for dense kernel matrices. The main idea of our method is to approximate the solution of a given PDE with a MAP estimator of a Gaussian process given the observation of the PDE at a finite number of collocation points. Although this optimization problem is infinite-dimensional, it can be reduced to a finite-dimensional one by introducing additional variables corresponding to the values of the derivatives of the solution at collocation points; this generalizes the representer theorem arising in Gaussian process regression. The reduced optimization problem has a quadratic loss and nonlinear constraints, and it is in turn solved with a variant of the Gauss-Newton method. The resulting algorithm (a) can be interpreted as solving successive linearizations of the nonlinear PDE, and (b) is found in practice to converge in a small number (two to ten) of iterations in experiments conducted on a range of PDEs. For IPs, while the traditional approach has been to iterate between the identifications of parameters in the PDE and the numerical approximation of its solution, our algorithm tackles both simultaneously. Experiments on nonlinear elliptic PDEs, Burgers' equation, a regularized Eikonal equation, and an IP for permeability identification in Darcy flow illustrate the efficacy and scope of our framework.
Stochastic Reweighted Gradient Descent
Hanchi, Ayoub El, Stephens, David A.
Despite the strong theoretical guarantees that variance-reduced finite-sum optimization algorithms enjoy, their applicability remains limited to cases where the memory overhead they introduce (SAG/SAGA), or the periodic full gradient computation they require (SVRG/SARAH) are manageable. A promising approach to achieving variance reduction while avoiding these drawbacks is the use of importance sampling instead of control variates. While many such methods have been proposed in the literature, directly proving that they improve the convergence of the resulting optimization algorithm has remained elusive. In this work, we propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient). We analyze the convergence of SRG in the strongly-convex case and show that, while it does not recover the linear rate of control variates methods, it provably outperforms SGD. We pay particular attention to the time and memory overhead of our proposed method, and design a specialized red-black tree allowing its efficient implementation. Finally, we present empirical results to support our findings.
A sampling criterion for constrained Bayesian optimization with uncertainties
Amri, Reda El, Riche, Rodolphe Le, Helbert, Cรฉline, Blanchet-Scalliet, Christophette, Da Veiga, Sรฉbastien
We consider the problem of chance constrained optimization where it is sought to optimize a function and satisfy constraints, both of which are affected by uncertainties. The real world declinations of this problem are particularly challenging because of their inherent computational cost. To tackle such problems, we propose a new Bayesian optimization method. It applies to the situation where the uncertainty comes from some of the inputs, so that it becomes possible to define an acquisition criterion in the joint controlled-uncontrolled input space. The main contribution of this work is an acquisition criterion that accounts for both the average improvement in objective function and the constraint reliability. The criterion is derived following the Stepwise Uncertainty Reduction logic and its maximization provides both optimal controlled and uncontrolled parameters. Analytical expressions are given to efficiently calculate the criterion. Numerical studies on test functions are presented. It is found through experimental comparisons with alternative sampling criteria that the adequation between the sampling criterion and the problem contributes to the efficiency of the overall optimization. As a side result, an expression for the variance of the improvement is given.
Goal Seeking Quadratic Unconstrained Binary Optimization
The Quadratic Unconstrained Binary Optimization (QUBO) modeling and solution framework is required for quantum and digital annealers whose goal is the optimization of a well defined metric, the objective function. However, diverse suboptimal solutions may be preferred over harder to implement strict optimal ones. In addition, the decision-maker usually has insights that are not always efficiently translated into the optimization model, such as acceptable target, interval or range values. Multi-criteria decision making is an example of involving the user in the decision process. In this paper, we present two variants of goal-seeking QUBO that minimize the deviation from the goal through a tabu-search based greedy one-flip heuristic. Experimental results illustrate the efficacy of the proposed approach over Constraint Programming for quickly finding a satisficing set of solutions.
Generative Minimization Networks: Training GANs Without Competition
Grnarova, Paulina, Kilcher, Yannic, Levy, Kfir Y., Lucchi, Aurelien, Hofmann, Thomas
Many applications in machine learning can be framed as minimization problems and solved efficiently using gradient-based techniques. However, recent applications of generative models, particularly GANs, have triggered interest in solving min-max games for which standard optimization techniques are often not suitable. Among known problems experienced by practitioners is the lack of convergence guarantees or convergence to a non-optimum cycle. At the heart of these problems is the min-max structure of the GAN objective which creates non-trivial dependencies between the players. We propose to address this problem by optimizing a different objective that circumvents the min-max structure using the notion of duality gap from game theory. We provide novel convergence guarantees on this objective and demonstrate why the obtained limit point solves the problem better than known techniques.
Multi-Robot Task Allocation -- Complexity and Approximation
Aziz, Haris, Chan, Hau, Cseh, รgnes, Li, Bo, Ramezani, Fahimeh, Wang, Chenhao
Multi-robot task allocation is one of the most fundamental classes of problems in robotics and is crucial for various real-world robotic applications such as search, rescue and area exploration. We consider the Single-Task robots and Multi-Robot tasks Instantaneous Assignment (ST-MR-IA) setting where each task requires at least a certain number of robots and each robot can work on at most one task and incurs an operational cost for each task. Our aim is to consider a natural computational problem of allocating robots to complete the maximum number of tasks subject to budget constraints. We consider budget constraints of three different kinds: (1) total budget, (2) task budget, and (3) robot budget. We provide a detailed complexity analysis including results on approximations as well as polynomial-time algorithms for the general setting and important restricted settings.
Sparsity-Inducing Optimal Control via Differential Dynamic Programming
Dinev, Traiko, Merkt, Wolfgang, Ivan, Vladimir, Havoutis, Ioannis, Vijayakumar, Sethu
Optimal control is a popular approach to synthesize highly dynamic motion. Commonly, $L_2$ regularization is used on the control inputs in order to minimize energy used and to ensure smoothness of the control inputs. However, for some systems, such as satellites, the control needs to be applied in sparse bursts due to how the propulsion system operates. In this paper, we study approaches to induce sparsity in optimal control solutions -- namely via smooth $L_1$ and Huber regularization penalties. We apply these loss terms to state-of-the-art DDP-based solvers to create a family of sparsity-inducing optimal control methods. We analyze and compare the effect of the different losses on inducing sparsity, their numerical conditioning, their impact on convergence, and discuss hyperparameter settings. We demonstrate our method in simulation and hardware experiments on canonical dynamics systems, control of satellites, and the NASA Valkyrie humanoid robot. We provide an implementation of our method and all examples for reproducibility on GitHub.