Optimization
Multi-Objectivizing Software Configuration Tuning (for a single performance concern)
Automatically tuning software configuration for optimizing a single performance attribute (e.g., minimizing latency) is not trivial, due to the nature of the configuration systems (e.g., complex landscape and expensive measurement). To deal with the problem, existing work has been focusing on developing various effective optimizers. However, a prominent issue that all these optimizers need to take care of is how to avoid the search being trapped in local optima – a hard nut to crack for software configuration tuning due to its rugged and sparse landscape, and neighboring configurations tending to behave very differently. Overcoming such in an expensive measurement setting is even more challenging. In this paper, we take a different perspective to tackle this issue.
Homepage feed multi-task learning using TensorFlow
Editor's Note: Multi-objective optimization (MOO) is used for many products at LinkedIn (such as the homepage feed) to help balance different behaviors in our ecosystem. There are two parts to how we work with multiple objectives: the first is about training high-fidelity models to predict member behavior (e.g., probability a member will click an article). The second is around trading off different objectives for a unified member experience based on utility to the LinkedIn ecosystem (e.g., a comment is much more valuable than a click). This post will focus on the first part of multi-objective optimization, where we utilize a multi-task, deep learning model to create higher fidelity consumption models; for more information on the second part, objective tradeoffs, see this article from KDnuggets about automatically tuning this tradeoff for faster model iteration. LinkedIn's members rely on the homepage feed for a variety of content including updates from their network, industry articles, and new job opportunities.
Auction-based and Distributed Optimization Approaches for Scheduling Observations in Satellite Constellations with Exclusive Orbit Portions
We investigate the use of multi-agent allocation techniques on problems related to Earth observation scenarios with multiple users and satellites. We focus on the problem of coordinating users having reserved exclusive orbit portions and one central planner having several requests that may use some intervals of these exclusives. We define this problem as Earth Observation Satellite Constellation Scheduling Problem (EOSCSP) and map it to a Mixed Integer Linear Program. As to solve EOSCSP, we propose market-based techniques and a distributed problem solving technique based on Distributed Constraint Optimization (DCOP), where agents cooperate to allocate requests without sharing their own schedules. These contributions are experimentally evaluated on randomly generated EOSCSP instances based on real large-scale or highly conflicting observation order books.
Adiabatic Quantum Feature Selection for Sparse Linear Regression
Desu, Surya Sai Teja, Srijith, P. K., Rao, M. V. Panduranga, Sivadasan, Naveen
Linear regression is a popular machine learning approach to learn and predict real valued outputs or dependent variables from independent variables or features. In many real world problems, its beneficial to perform sparse linear regression to identify important features helpful in predicting the dependent variable. It not only helps in getting interpretable results but also avoids overfitting when the number of features is large, and the amount of data is small. The most natural way to achieve this is by using `best subset selection' which penalizes non-zero model parameters by adding $\ell_0$ norm over parameters to the least squares loss. However, this makes the objective function non-convex and intractable even for a small number of features. This paper aims to address the intractability of sparse linear regression with $\ell_0$ norm using adiabatic quantum computing, a quantum computing paradigm that is particularly useful for solving optimization problems faster. We formulate the $\ell_0$ optimization problem as a Quadratic Unconstrained Binary Optimization (QUBO) problem and solve it using the D-Wave adiabatic quantum computer. We study and compare the quality of QUBO solution on synthetic and real world datasets. The results demonstrate the effectiveness of the proposed adiabatic quantum computing approach in finding the optimal solution. The QUBO solution matches the optimal solution for a wide range of sparsity penalty values across the datasets.
Subgroup Fairness in Two-Sided Markets
Zhou, Quan, Marecek, Jakub, Shorten, Robert N.
It is well known that two-sided markets are unfair in a number of ways. For instance, female workers at Uber earn less than their male colleagues per mile driven. Similar observations have been made for other minority subgroups in other two-sided markets. Here, we suggest a novel market-clearing mechanism for two-sided markets, which promotes equalisation of the pay per hour worked across multiple subgroups, as well as within each subgroup. In the process, we introduce a novel notion of subgroup fairness (which we call Inter-fairness), which can be combined with other notions of fairness within each subgroup (called Intra-fairness), and the utility for the customers (Customer-Care) in the objective of the market-clearing problem. While the novel non-linear terms in the objective complicate market clearing by making the problem non-convex, we show that a certain non-convex augmented Lagrangian relaxation can be approximated to any precision in time polynomial in the number of market participants using semi-definite programming. This makes it possible to implement the market-clearing mechanism efficiently. On the example of driver-ride assignment in an Uber-like system, we demonstrate the efficacy and scalability of the approach, and trade-offs between Inter- and Intra-fairness.
ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure
Dangel, Felix, Tatzel, Lukas, Hennig, Philipp
Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks. Existing methods based on implicit multiplication via automatic differentiation or Kronecker-factored block diagonal approximations do not consider noise in the mini-batch. We present ViViT, a curvature model that leverages the GGN's low-rank structure without further approximations. It allows for efficient computation of eigenvalues, eigenvectors, as well as per-sample first- and second-order directional derivatives. The representation is computed in parallel with gradients in one backward pass and offers a fine-grained cost-accuracy trade-off, which allows it to scale. As examples for ViViT's usefulness, we investigate the directional gradients and curvatures during training, and how noise information can be used to improve the stability of second-order methods.
Local Adaptivity in Federated Learning: Convergence and Consistency
Wang, Jianyu, Xu, Zheng, Garrett, Zachary, Charles, Zachary, Liu, Luyang, Joshi, Gauri
The federated learning (FL) framework trains a machine learning model using decentralized data stored at edge client devices by periodically aggregating locally trained models. Popular optimization algorithms of FL use vanilla (stochastic) gradient descent for both local updates at clients and global updates at the aggregating server. Recently, adaptive optimization methods such as AdaGrad have been studied for server updates. However, the effect of using adaptive optimization methods for local updates at clients is not yet understood. We show in both theory and practice that while local adaptive methods can accelerate convergence, they can cause a non-vanishing solution bias, where the final converged solution may be different from the stationary point of the global objective function. We propose correction techniques to overcome this inconsistency and complement the local adaptive methods for FL. Extensive experiments on realistic federated training tasks show that the proposed algorithms can achieve faster convergence and higher test accuracy than the baselines without local adaptivity.
Efficient Online-Bandit Strategies for Minimax Learning Problems
Roux, Christophe, Wirth, Elias, Pokutta, Sebastian, Kerdreux, Thomas
Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning or learning with non-standard aggregated losses. More specifically, these problems are convex-linear problems where the minimization is carried out over the model parameters $w\in\mathcal{W}$ and the maximization over the empirical distribution $p\in\mathcal{K}$ of the training set indexes, where $\mathcal{K}$ is the simplex or a subset of it. To design efficient methods, we let an online learning algorithm play against a (combinatorial) bandit algorithm. We argue that the efficiency of such approaches critically depends on the structure of $\mathcal{K}$ and propose two properties of $\mathcal{K}$ that facilitate designing efficient algorithms. We focus on a specific family of sets $\mathcal{S}_{n,k}$ encompassing various learning applications and provide high-probability convergence guarantees to the minimax values.
A nearly Blackwell-optimal policy gradient method
Dewanto, Vektor, Gallagher, Marcus
For continuing environments, reinforcement learning methods commonly maximize a discounted reward criterion with discount factor close to 1 in order to approximate the steady-state reward (the gain). However, such a criterion only considers the long-run performance, ignoring the transient behaviour. In this work, we develop a policy gradient method that optimizes the gain, then the bias (which indicates the transient performance and is important to capably select from policies with equal gain). We derive expressions that enable sampling for the gradient of the bias, and its preconditioning Fisher matrix. We further propose an algorithm that solves the corresponding bi-level optimization using a logarithmic barrier. Experimental results provide insights into the fundamental mechanisms of our proposal.
A Scalable Second Order Method for Ill-Conditioned Matrix Completion from Few Samples
Kümmerle, Christian, Verdun, Claudio Mayrink
We propose an iterative algorithm for low-rank matrix completion that can be interpreted as an iteratively reweighted least squares (IRLS) algorithm, a saddle-escaping smoothing Newton method or a variable metric proximal gradient method applied to a non-convex rank surrogate. It combines the favorable data-efficiency of previous IRLS approaches with an improved scalability by several orders of magnitude. We establish the first local convergence guarantee from a minimal number of samples for that class of algorithms, showing that the method attains a local quadratic convergence rate. Furthermore, we show that the linear systems to be solved are well-conditioned even for very ill-conditioned ground truth matrices. We provide extensive experiments, indicating that unlike many state-of-the-art approaches, our method is able to complete very ill-conditioned matrices with a condition number of up to $10^{10}$ from few samples, while being competitive in its scalability.