Optimization
Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions
Mishkin, Aaron, Sahiner, Arda, Pilanci, Mert
We develop fast algorithms and robust software for convex optimization of two-layer neural networks with ReLU activation functions. Our work leverages a convex reformulation of the standard weight-decay penalized training problem as a set of group-$\ell_1$-regularized data-local models, where locality is enforced by polyhedral cone constraints. In the special case of zero-regularization, we show that this problem is exactly equivalent to unconstrained optimization of a convex "gated ReLU" network. For problems with non-zero regularization, we show that convex gated ReLU models obtain data-dependent approximation bounds for the ReLU training problem. To optimize the convex reformulations, we develop an accelerated proximal gradient method and a practical augmented Lagrangian solver. We show that these approaches are faster than standard training heuristics for the non-convex problem, such as SGD, and outperform commercial interior-point solvers. Experimentally, we verify our theoretical results, explore the group-$\ell_1$ regularization path, and scale convex optimization for neural networks to image classification on MNIST and CIFAR-10.
ANCER: Anisotropic Certification via Sample-wise Volume Maximization
Eiras, Francisco, Alfarra, Motasem, Kumar, M. Pawan, Torr, Philip H. S., Dokania, Puneet K., Ghanem, Bernard, Bibi, Adel
Randomized smoothing has recently emerged as an effective tool that enables certification of deep neural network classifiers at scale. All prior art on randomized smoothing has focused on isotropic $\ell_p$ certification, which has the advantage of yielding certificates that can be easily compared among isotropic methods via $\ell_p$-norm radius. However, isotropic certification limits the region that can be certified around an input to worst-case adversaries, i.e., it cannot reason about other "close", potentially large, constant prediction safe regions. To alleviate this issue, (i) we theoretically extend the isotropic randomized smoothing $\ell_1$ and $\ell_2$ certificates to their generalized anisotropic counterparts following a simplified analysis. Moreover, (ii) we propose evaluation metrics allowing for the comparison of general certificates - a certificate is superior to another if it certifies a superset region - with the quantification of each certificate through the volume of the certified region. We introduce ANCER, a framework for obtaining anisotropic certificates for a given test set sample via volume maximization. We achieve it by generalizing memory-based certification of data-dependent classifiers. Our empirical results demonstrate that ANCER achieves state-of-the-art $\ell_1$ and $\ell_2$ certified accuracy on CIFAR-10 and ImageNet in the data-dependence setting, while certifying larger regions in terms of volume, highlighting the benefits of moving away from isotropic analysis. Our code is available in https://github.com/MotasemAlfarra/ANCER.
Data-Driven Chance Constrained AC-OPF using Hybrid Sparse Gaussian Processes
Mitrovic, Mile, Lukashevich, Aleksandr, Vorobev, Petr, Terzija, Vladimir, Maximov, Yury, Deka, Deepjyoti
The alternating current (AC) chance-constrained optimal power flow (CC-OPF) problem addresses the economic efficiency of electricity generation and delivery under generation uncertainty. The latter is intrinsic to modern power grids because of the high amount of renewables. Despite its academic success, the AC CC-OPF problem is highly nonlinear and computationally demanding, which limits its practical impact. For improving the AC-OPF problem complexity/accuracy trade-off, the paper proposes a fast data-driven setup that uses the sparse and hybrid Gaussian processes (GP) framework to model the power flow equations with input uncertainty. We advocate the efficiency of the proposed approach by a numerical study over multiple IEEE test cases showing up to two times faster and more accurate solutions compared to the state-of-the-art methods.
Efficient Learning of Interpretable Classification Rules
Ghosh, Bishwamittra, Malioutov, Dmitry, Meel, Kuldeep S.
Machine learning has become omnipresent with applications in various safety-critical domains such as medical, law, and transportation. In these domains, high-stake decisions provided by machine learning necessitate researchers to design interpretable models, where the prediction is understandable to a human. In interpretable machine learning, rule-based classifiers are particularly effective in representing the decision boundary through a set of rules comprising input features. The interpretability of rule-based classifiers is in general related to the size of the rules, where smaller rules are considered more interpretable. To learn such a classifier, the brute-force direct approach is to consider an optimization problem that tries to learn the smallest classification rule that has close to maximum accuracy. This optimization problem is computationally intractable due to its combinatorial nature and thus, the problem is not scalable in large datasets. To this end, in this paper we study the triangular relationship among the accuracy, interpretability, and scalability of learning rule-based classifiers. The contribution of this paper is an interpretable learning framework IMLI, that is based on maximum satisfiability (MaxSAT) for synthesizing classification rules expressible in proposition logic. Despite the progress of MaxSAT solving in the last decade, the straightforward MaxSAT-based solution cannot scale. Therefore, we incorporate an efficient incremental learning technique inside the MaxSAT formulation by integrating mini-batch learning and iterative rule-learning. In our experiments, IMLI achieves the best balance among prediction accuracy, interpretability, and scalability. As an application, we deploy IMLI in learning popular interpretable classifiers such as decision lists and decision sets.
Distributed Constraint-Coupled Optimization over Lossy Networks
Doostmohammadian, Mohammadreza, Khan, Usman A., Aghasi, Alireza, Charalambous, Themistoklis
This paper considers distributed resource allocation and sum-preserving constrained optimization over lossy networks, where the links are unreliable and subject to packet drops. We define the conditions to ensure convergence under packet drops and link removal by focusing on two main properties of our allocation algorithm: (i) The weight-stochastic condition in typical consensus schemes is reduced to balanced weights, with no need for readjusting the weights to satisfy stochasticity. (ii) The algorithm does not require all-time connectivity but instead uniform connectivity over some non-overlapping finite time intervals. First, we prove that our algorithm provides primal-feasible allocation at every iteration step and converges under the conditions (i)-(ii) and some other mild conditions on the nonlinear iterative dynamics. These nonlinearities address possible practical constraints in real applications due to, for example, saturation or quantization among others. Then, using (i)-(ii) and the notion of bond-percolation theory, we relate the packet drop rate and the network percolation threshold to the (finite) number of iterations ensuring uniform connectivity and, thus, convergence towards the optimum value.
Improving Operational Efficiency In EV Ridepooling Fleets By Predictive Exploitation of Idle Times
Provoost, Jesper C., Kamilaris, Andreas, Gidรณfalvi, Gyรถzรถ, Heijenk, Geert J., Wismans, Luc J. J.
In ridepooling systems with electric fleets, charging is a complex decision-making process. Most electric vehicle (EV) taxi services require drivers to make egoistic decisions, leading to decentralized ad-hoc charging strategies. The current state of the mobility system is often lacking or not shared between vehicles, making it impossible to make a system-optimal decision. Most existing approaches do not combine time, location and duration into a comprehensive control algorithm or are unsuitable for real-time operation. We therefore present a real-time predictive charging method for ridepooling services with a single operator, called Idle Time Exploitation (ITX), which predicts the periods where vehicles are idle and exploits these periods to harvest energy. It relies on Graph Convolutional Networks and a linear assignment algorithm to devise an optimal pairing of vehicles and charging stations, in pursuance of maximizing the exploited idle time. We evaluated our approach through extensive simulation studies on real-world datasets from New York City. The results demonstrate that ITX outperforms all baseline methods by at least 5% (equivalent to $70,000 for a 6,000 vehicle operation) per week in terms of a monetary reward function which was modeled to replicate the profitability of a real-world ridepooling system. Moreover, ITX can reduce delays by at least 4.68% in comparison with baseline methods and generally increase passenger comfort by facilitating a better spread of customers across the fleet. Our results also demonstrate that ITX enables vehicles to harvest energy during the day, stabilizing battery levels and increasing resilience to unexpected surges in demand. Lastly, compared to the best-performing baseline strategy, peak loads are reduced by 17.39% which benefits grid operators and paves the way for more sustainable use of the electrical grid.
Online Meta-Learning for Model Update Aggregation in Federated Learning for Click-Through Rate Prediction
Liu, Xianghang, Twardowski, Bartลomiej, Wijaya, Tri Kurniawan
In Federated Learning (FL) of click-through rate (CTR) prediction, users' data is not shared for privacy protection. The learning is performed by training locally on client devices and communicating only model changes to the server. There are two main challenges: (i) the client heterogeneity, making FL algorithms that use the weighted averaging to aggregate model updates from the clients have slow progress and unsatisfactory learning results; and (ii) the difficulty of tuning the server learning rate with trial-and-error methodology due to the big computation time and resources needed for each experiment. To address these challenges, we propose a simple online meta-learning method to learn a strategy of aggregating the model updates, which adaptively weighs the importance of the clients based on their attributes and adjust the step sizes of the update. We perform extensive evaluations on public datasets. Our method significantly outperforms the state-of-the-art in both the speed of convergence and the quality of the final learning results.
Prediction-based One-shot Dynamic Parking Pricing
Hong, Seoyoung, Shin, Heejoo, Choi, Jeongwhan, Park, Noseong
Many U.S. metropolitan cities are notorious for their severe shortage of parking spots. To this end, we present a proactive prediction-driven optimization framework to dynamically adjust parking prices. We use state-of-the-art deep learning technologies such as neural ordinary differential equations (NODEs) to design our future parking occupancy rate prediction model given historical occupancy rates and price information. Owing to the continuous and bijective characteristics of NODEs, in addition, we design a one-shot price optimization method given a pre-trained prediction model, which requires only one iteration to find the optimal solution. In other words, we optimize the price input to the pre-trained prediction model to achieve targeted occupancy rates in the parking blocks. We conduct experiments with the data collected in San Francisco and Seattle for years. Our prediction model shows the best accuracy in comparison with various temporal or spatio-temporal forecasting models. Our one-shot optimization method greatly outperforms other black-box and white-box search methods in terms of the search time and always returns the optimal price solution.
Efficient Learning of Interpretable Classification Rules
Ghosh, Bishwamittra (National University of Singapore) | Malioutov, Dmitry | Meel, Kuldeep S. (National University of Singapore)
Machine learning has become omnipresent with applications in various safety-critical domains such as medical, law, and transportation. In these domains, high-stake decisions provided by machine learning necessitate researchers to design interpretable models, where the prediction is understandable to a human. In interpretable machine learning, rule-based classifiers are particularly effective in representing the decision boundary through a set of rules comprising input features. Examples of such classifiers include decision trees, decision lists, and decision sets. The interpretability of rule-based classifiers is in general related to the size of the rules, where smaller rules are considered more interpretable. To learn such a classifier, the brute-force direct approach is to consider an optimization problem that tries to learn the smallest classification rule that has close to maximum accuracy. This optimization problem is computationally intractable due to its combinatorial nature and thus, the problem is not scalable in large datasets. To this end, in this paper we study the triangular relationship among the accuracy, interpretability, and scalability of learning rule-based classifiers. The contribution of this paper is an interpretable learning framework IMLI, that is based on maximum satisfiability (MaxSAT) for synthesizing classification rules expressible in proposition logic. IMLI considers a joint objective function to optimize the accuracy and the interpretability of classification rules and learns an optimal rule by solving an appropriately designed MaxSAT query. Despite the progress of MaxSAT solving in the last decade, the straightforward MaxSAT-based solution cannot scale to practical classification datasets containing thousands to millions of samples. Therefore, we incorporate an efficient incremental learning technique inside the MaxSAT formulation by integrating mini-batch learning and iterative rule-learning. The resulting framework learns a classifier by iteratively covering the training data, wherein in each iteration, it solves a sequence of smaller MaxSAT queries corresponding to each mini-batch. In our experiments, IMLI achieves the best balance among prediction accuracy, interpretability, and scalability. For instance, IMLI attains a competitive prediction accuracy and interpretability w.r.t. existing interpretable classifiers and demonstrates impressive scalability on large datasets where both interpretable and non-interpretable classifiers fail. As an application, we deploy IMLI in learning popular interpretable classifiers such as decision lists and decision sets. The source code is available at https://github.com/meelgroup/mlic.
Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO
Muller, Paul, Rowland, Mark, Elie, Romuald, Piliouras, Georgios, Perolat, Julien, Lauriere, Mathieu, Marinier, Raphael, Pietquin, Olivier, Tuyls, Karl
Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-field games pro-vides an asymptotic solution to this problem when the considered gamesare anonymous-symmetric. Unfortunately, the mean-field approximationintroduces non-linearities which prevent a straightforward adaptation ofPSRO. Building upon optimization and adversarial regret minimization,this paper sidesteps this issue and introduces mean-field PSRO, an adap-tation of PSRO which learns Nash, coarse correlated and correlated equi-libria in mean-field games. The key is to replace the exact distributioncomputation step by newly-defined mean-field no-adversarial-regret learn-ers, or by black-box optimization. We compare the asymptotic complexityof the approach to standard PSRO, greatly improve empirical bandit con-vergence speed by compressing temporal mixture weights, and ensure itis theoretically robust to payoff noise. Finally, we illustrate the speed andaccuracy of mean-field PSRO on several mean-field games, demonstratingconvergence to strong and weak equilibria.