Optimization
Learning Cost Functions for Optimal Transport
Sun, Haodong, Zhou, Haomin, Zha, Hongyuan, Ye, Xiaojing
Learning the cost function for optimal transport from observed transport plan or its samples has been cast as a bi-level optimization problem. In this paper, we derive an unconstrained convex optimization formulation for the problem which can be further augmented by any customizable regularization. This novel framework avoids repeatedly solving a forward optimal transport problem in each iteration which has been a thorny computational bottleneck for the bi-level optimization approach. To validate the effectiveness of this framework, we develop two numerical algorithms, one is a fast matrix scaling method based on the Sinkhorn-Knopp algorithm for the discrete case, and the other is a supervised learning algorithm that realizes the cost function as a deep neural network in the continuous case. Numerical results demonstrate promising efficiency and accuracy advantages of the proposed algorithms over existing state of the art methods.
Effective End-to-End Learning Framework for Economic Dispatch
Lu, Chenbei, Wang, Kui, Wu, Chenye
Conventional wisdom to improve the effectiveness of economic dispatch is to design the load forecasting method as accurately as possible. However, this approach can be problematic due to the temporal and spatial correlations between system cost and load prediction errors. This motivates us to adopt the notion of end-to-end machine learning and to propose a task-specific learning criteria to conduct economic dispatch. Specifically, to maximize the data utilization, we design an efficient optimization kernel for the learning process. We provide both theoretical analysis and empirical insights to highlight the effectiveness and efficiency of the proposed learning framework.
Safe Screening for the Generalized Conditional Gradient Method
The conditional gradient method (CGM) has been widely used for fast sparse approximation, having a low per iteration computational cost for structured sparse regularizers. We explore the sparsity acquiring properties of a generalized CGM (gCGM), where the constraint is replaced by a penalty function based on a gauge penalty; this can be done without significantly increasing the per-iteration computation, and applies to general notions of sparsity. Without assuming bounded iterates, we show $O(1/t)$ convergence of the function values and gap of gCGM. We couple this with a safe screening rule, and show that at a rate $O(1/(t\delta^2))$, the screened support matches the support at the solution, where $\delta \geq 0$ measures how close the problem is to being degenerate. In our experiments, we show that the gCGM for these modified penalties have similar feature selection properties as common penalties, but with potentially more stability over the choice of hyperparameter.
Nonmyopic Gaussian Process Optimization with Macro-Actions
Kharkovskii, Dmitrii, Ling, Chun Kai, Low, Kian Hsiang
This paper presents a multi-staged approach to nonmyopic adaptive Gaussian process optimization (GPO) for Bayesian optimization (BO) of unknown, highly complex objective functions that, in contrast to existing nonmyopic adaptive BO algorithms, exploits the notion of macro-actions for scaling up to a further lookahead to match up to a larger available budget. To achieve this, we generalize GP upper confidence bound to a new acquisition function defined w.r.t. a nonmyopic adaptive macro-action policy, which is intractable to be optimized exactly due to an uncountable set of candidate outputs. The contribution of our work here is thus to derive a nonmyopic adaptive epsilon-Bayes-optimal macro-action GPO (epsilon-Macro-GPO) policy. To perform nonmyopic adaptive BO in real time, we then propose an asymptotically optimal anytime variant of our epsilon-Macro-GPO policy with a performance guarantee. We empirically evaluate the performance of our epsilon-Macro-GPO policy and its anytime variant in BO with synthetic and real-world datasets.
CO-Optimal Transport
Redko, Ievgen, Vayer, Titouan, Flamary, Rémi, Courty, Nicolas
Optimal transport (OT) is a powerful geometric and probabilistic tool for finding correspondences and measuring similarity between two distributions. Yet, its original formulation relies on the existence of a cost function between the samples of the two distributions, which makes it impractical for comparing data distributions supported on different topological spaces. To circumvent this limitation, we propose a novel OT problem, named COOT for CO-Optimal Transport, that aims to simultaneously optimize two transport maps between both samples and features. This is different from other approaches that either discard the individual features by focussing on pairwise distances (e.g. Gromov-Wasserstein) or need to model explicitly the relations between the features. COOT leads to interpretable correspondences between both samples and feature representations and holds metric properties. We provide a thorough theoretical analysis of our framework and establish rich connections with the Gromov-Wasserstein distance. We demonstrate its versatility with two machine learning applications in heterogeneous domain adaptation and co-clustering/data summarization, where COOT leads to performance improvements over the competing state-of-the-art methods.
PhD Studentship on Artificial Intelligence for Railway Operations and Management Project Opportunities PhD
Defining the roadmaps for Artificial Intelligence applications for railway operations and network management Applications are invited for a PhD studentship in innovative approaches in artificial intelligence for railway scheduling and operations, to be based in Institute for Transport Studies at University of Leeds. The position is an opportunity to combine cutting-edge research at the intersection of railway scheduling and artificial intelligence techniques such as machine learning, neural networks. The overall objective of the PhD research project is to investigate the potential of Artificial Intelligence (AI) in the rail sector and contribute to the definition of roadmaps for future research in operational intelligence and network management. In particular, the student will develop and compare different AI approaches, e.g. machine learning, deep and reinforcement learning, for railway traffic planning and management. He or she will have a chance to investigate using AI for solving combinatorial optimization problems, AI for supporting optimization models, with special focus on the optimization models for railway operations and management.
SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm
Hao, Yi, Jain, Ayush, Orlitsky, Alon, Ravindrakumar, Vaishakh
Sample- and computationally-efficient distribution estimation is a fundamental tenet in statistics and machine learning. We present $\mathrm{SURF}$, an algorithm for approximating distributions by piecewise polynomials. $\mathrm{SURF}$ is simple, replacing existing general-purpose optimization techniques by straight-forward approximation of each potential polynomial piece by a simple empirical-probability interpolation, and using plain divide-and-conquer to merge the pieces. It is universal, as well-known low-degree polynomial-approximation results imply that it accurately approximates a large class of common distributions. $\mathrm{SURF}$ is robust to distribution mis-specification as for any degree $d\le 8$, it estimates any distribution to an $\ell_1$ distance $ <3 $ times that of the nearest degree-$d$ piecewise polynomial, improving known factor upper bounds of 3 for single polynomials and 15 for polynomials with arbitrarily many pieces. It is fast, using optimal sample complexity, and running in near sample-linear time. In experiments, $\mathrm{SURF}$ significantly outperforms state-of-the art algorithms.
Knot Selection in Sparse Gaussian Processes
Garton, Nathaniel, Niemi, Jarad, Carriquiry, Alicia
Knot-based, sparse Gaussian processes have enjoyed considerable success as scalable approximations to full Gaussian processes. Problems can occur, however, when knot selection is done by optimizing the marginal likelihood. For example, the marginal likelihood surface is highly multimodal, which can cause suboptimal knot placement where some knots serve practically no function. This is especially a problem when many more knots are used than are necessary, resulting in extra computational cost for little to no gains in accuracy. We propose a one-at-a-time knot selection algorithm to select both the number and placement of knots. Our algorithm uses Bayesian optimization to efficiently propose knots that are likely to be good and largely avoids the pathologies encountered when using the marginal likelihood as the objective function. We provide empirical results showing improved accuracy and speed over the current standard approaches.
Robust Optimization for Fairness with Noisy Protected Groups
Wang, Serena, Guo, Wenshuo, Narasimhan, Harikrishna, Cotter, Andrew, Gupta, Maya, Jordan, Michael I.
Many existing fairness criteria for machine learning involve equalizing or achieving some metric across \textit{protected groups} such as race or gender groups. However, practitioners trying to audit or enforce such group-based criteria can easily face the problem of noisy or biased protected group information. We study this important practical problem in two ways. First, we study the consequences of na{\"i}vely only relying on noisy protected groups: we provide an upper bound on the fairness violations on the true groups $G$ when the fairness criteria are satisfied on noisy groups $\hat{G}$. Second, we introduce two new approaches using robust optimization that, unlike the na{\"i}ve approach of only relying on $\hat{G}$, are guaranteed to satisfy fairness criteria on the true protected groups $G$ while minimizing a training objective. We provide theoretical guarantees that one such approach converges to an optimal feasible solution. Using two case studies, we empirically show that the robust approaches achieve better true group fairness guarantees than the na{\"i}ve approach.