Optimization
Optimal Probing with Statistical Guarantees for Network Monitoring at Scale
Amjad, Muhammad Jehangir, Diot, Christophe, Konomis, Dimitris, Kveton, Branislav, Soule, Augustin, Yang, Xiaolong
Cloud networks are difficult to monitor because they grow rapidly and the budgets for monitoring them are limited. We propose a framework for estimating network metrics, such as latency and packet loss, with guarantees on estimation errors for a fixed monitoring budget. Our proposed algorithms produce a distribution of probes across network paths, which we then monitor; and are based on A- and E-optimal experimental designs in statistics. Unfortunately, these designs are too computationally costly to use at production scale. We propose their scalable and near-optimal approximations based on the Frank-Wolfe algorithm. We validate our approaches in simulation on real network topologies, and also using a production probing system in a real cloud network. We show major gains in reducing the probing budget compared to both production and academic baselines, while maintaining low estimation errors, even with very low probing budgets.
Sign-MAML: Efficient Model-Agnostic Meta-Learning by SignSGD
Fan, Chen, Ram, Parikshit, Liu, Sijia
We propose a new computationally-efficient first-order algorithm for Model-Agnostic Meta-Learning (MAML). The key enabling technique is to interpret MAML as a bilevel optimization (BLO) problem and leverage the sign-based SGD(signSGD) as a lower-level optimizer of BLO. We show that MAML, through the lens of signSGD-oriented BLO, naturally yields an alternating optimization scheme that just requires first-order gradients of a learned meta-model. We term the resulting MAML algorithm Sign-MAML. Compared to the conventional first-order MAML (FO-MAML) algorithm, Sign-MAML is theoretically-grounded as it does not impose any assumption on the absence of second-order derivatives during meta training. In practice, we show that Sign-MAML outperforms FO-MAML in various few-shot image classification tasks, and compared to MAML, it achieves a much more graceful tradeoff between classification accuracy and computation efficiency.
Non-smooth Bayesian Optimization in Tuning Problems
Luo, Hengrui, Demmel, James W., Cho, Younghyun, Li, Xiaoye S., Liu, Yang
Building surrogate models is one common approach when we attempt to learn unknown black-box functions. Bayesian optimization provides a framework which allows us to build surrogate models based on sequential samples drawn from the function and find the optimum. Tuning algorithmic parameters to optimize the performance of large, complicated "black-box" application codes is a specific important application, which aims at finding the optima of black-box functions. Within the Bayesian optimization framework, the Gaussian process model produces smooth or continuous sample paths. However, the black-box function in the tuning problem is often non-smooth. This difficult tuning problem is worsened by the fact that we usually have limited sequential samples from the black-box function. Motivated by these issues encountered in tuning, we propose a novel additive Gaussian process model called clustered Gaussian process (cGP), where the additive components are induced by clustering. In the examples we studied, the performance can be improved by as much as 90% among repetitive experiments. By using this surrogate model, we want to capture the non-smoothness of the black-box function. In addition to an algorithm for constructing this model, we also apply the model to several artificial and real applications to evaluate it.
Global Big Data Conference
Getting the software right is important when developing machine learning models, such as recommendation or classification systems. But at eBay, optimizing the software to run on a particular piece of hardware using distillation and quantization techniques was absolutely essential to ensure scalability. "[I]n order to build a truly global marketplace that is driven by state of the art and powerful and scalable AI services," Kopru said, "you have to do a lot of optimizations after model training, and specifically for the target hardware." With 1.5 billion active listings from more than 19 million active sellers trying to reach 159 million active buyers, the ecommerce giant has a global reach that is matched by only a handful of firms. Machine learning and other AI techniques, such as natural language processing (NLP), play big roles in scaling eBay's operations to reach its massive audience. For instance, automatically generated descriptions of product listings is crucial for displaying information on the small screens of smart phones, Kopru said.
A Semi-exact Algorithm for Quickly Computing A Maximum Weight Clique in Large Sparse Graphs
Cai, Shaowei, Lin, Jinkun, Wang, Yiyuan, Strash, Darren
This paper explores techniques to quickly solve the maximum weight clique problem (MWCP) in very large scale sparse graphs. Due to their size, and the hardness of MWCP, it is infeasible to solve many of these graphs with exact algorithms. Although recent heuristic algorithms make progress in solving MWCP in large graphs, they still need considerable time to get a high-quality solution. In this work, we focus on solving MWCP for large sparse graphs within a short time limit. We propose a new method for MWCP which interleaves clique finding with data reduction rules. We propose novel ideas to make this process efficient, and develop an algorithm called FastWClq. Experiments on a broad range of large sparse graphs show that FastWClq finds better solutions than state-of-the-art algorithms while the running time of FastWClq is much shorter than the competitors for most instances. Further, FastWClq proves the optimality of its solutions for roughly half of the graphs, all with at least 105 vertices, with an average time of 21 seconds.
Deceptive Decision-Making Under Uncertainty
Savas, Yagiz, Verginis, Christos K., Topcu, Ufuk
We study the design of autonomous agents that are capable of deceiving outside observers about their intentions while carrying out tasks in stochastic, complex environments. By modeling the agent's behavior as a Markov decision process, we consider a setting where the agent aims to reach one of multiple potential goals while deceiving outside observers about its true goal. We propose a novel approach to model observer predictions based on the principle of maximum entropy and to efficiently generate deceptive strategies via linear programming. The proposed approach enables the agent to exhibit a variety of tunable deceptive behaviors while ensuring the satisfaction of probabilistic constraints on the behavior. We evaluate the performance of the proposed approach via comparative user studies and present a case study on the streets of Manhattan, New York, using real travel time distributions.
From Alignment to Assignment: Frustratingly Simple Unsupervised Entity Alignment
Mao, Xin, Wang, Wenting, Wu, Yuanbin, Lan, Man
Cross-lingual entity alignment (EA) aims to find the equivalent entities between crosslingual KGs, which is a crucial step for integrating KGs. Recently, many GNN-based EA methods are proposed and show decent performance improvements on several public datasets. Meanwhile, existing GNN-based EA methods inevitably inherit poor interpretability and low efficiency from neural networks. Motivated by the isomorphic assumption of GNNbased methods, we successfully transform the cross-lingual EA problem into the assignment problem. Based on this finding, we propose a frustratingly Simple but Effective Unsupervised entity alignment method (SEU) without neural networks. Extensive experiments show that our proposed unsupervised method even beats advanced supervised methods across all public datasets and has high efficiency, interpretability, and stability.
Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free Optimization Algorithms
Mebratu, Derssie, Hasabnis, Niranjan, Mercati, Pietro, Sharma, Gaurit, Najnin, Shamima
Modern deep learning (DL) applications are built using DL libraries and frameworks such as TensorFlow and PyTorch. These frameworks have complex parameters and tuning them to obtain good training and inference performance is challenging for typical users, such as DL developers and data scientists. Manual tuning requires deep knowledge of the user-controllable parameters of DL frameworks as well as the underlying hardware. It is a slow and tedious process, and it typically delivers sub-optimal solutions. In this paper, we treat the problem of tuning parameters of DL frameworks to improve training and inference performance as a black-box optimization problem. We then investigate applicability and effectiveness of Bayesian optimization (BO), genetic algorithm (GA), and Nelder-Mead simplex (NMS) to tune the parameters of TensorFlow's CPU backend. While prior work has already investigated the use of Nelder-Mead simplex for a similar problem, it does not provide insights into the applicability of other more popular algorithms. Towards that end, we provide a systematic comparative analysis of all three algorithms in tuning TensorFlow's CPU backend on a variety of DL models. Our findings reveal that Bayesian optimization performs the best on the majority of models. There are, however, cases where it does not deliver the best results.
On Tilted Losses in Machine Learning: Theory and Applications
Li, Tian, Beirami, Ahmad, Sanjabi, Maziar, Smith, Virginia
Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM -- tilted empirical risk minimization (TERM) -- which uses exponential tilting to flexibly tune the impact of individual losses. The resulting framework has several useful properties: We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness; has variance-reduction properties that can benefit generalization; and can be viewed as a smooth approximation to a superquantile method. Our work makes rigorous connections between TERM and related objectives, such as Value-at-Risk, Conditional Value-at-Risk, and distributionally robust optimization (DRO). We develop batch and stochastic first-order optimization methods for solving TERM, provide convergence guarantees for the solvers, and show that the framework can be efficiently solved relative to common alternatives. Finally, we demonstrate that TERM can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. Despite the straightforward modification TERM makes to traditional ERM objectives, we find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches.
Nonlinear matrix recovery using optimization on the Grassmann manifold
Goyens, Florentin, Cartis, Coralia, Eftekhari, Armin
We investigate the problem of recovering a partially observed high-rank matrix whose columns obey a nonlinear structure such as a union of subspaces, an algebraic variety or grouped in clusters. The recovery problem is formulated as the rank minimization of a nonlinear feature map applied to the original matrix, which is then further approximated by a constrained non-convex optimization problem involving the Grassmann manifold. We propose two sets of algorithms, one arising from Riemannian optimization and the other as an alternating minimization scheme, both of which include first- and second-order variants. Both sets of algorithms have theoretical guarantees. In particular, for the alternating minimization, we establish global convergence and worst-case complexity bounds. Additionally, using the Kurdyka-Lojasiewicz property, we show that the alternating minimization converges to a unique limit point. We provide extensive numerical results for the recovery of union of subspaces and clustering under entry sampling and dense Gaussian sampling. Our methods are competitive with existing approaches and, in particular, high accuracy is achieved in the recovery using Riemannian second-order methods.