Goto

Collaborating Authors

 Optimization


Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization

arXiv.org Artificial Intelligence

The pipeline optimization problem in machine learning requires simultaneous optimization of pipeline structures and parameter adaptation of their elements. Having an elegant way to express these structures can help lessen the complexity in the management and analysis of their performances together with the different choices of optimization strategies. With these issues in mind, we created the AutoMLPipeline (AMLP) toolkit which facilitates the creation and evaluation of complex machine learning pipeline structures using simple expressions. We use AMLP to find optimal pipeline signatures, datamine them, and use these datamined features to speed-up learning and prediction. We formulated a two-stage pipeline optimization with surrogate modeling in AMLP which outperforms other AutoML approaches with a 4-hour time budget in less than 5 minutes of AMLP computation time.


Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions

arXiv.org Machine Learning

Generative Adversarial Networks (GANs) are commonly used for modeling complex distributions of data. Both the generators and discriminators of GANs are often modeled by neural networks, posing a non-transparent optimization problem which is non-convex and non-concave over the generator and discriminator, respectively. Such networks are often heuristically optimized with gradient descent-ascent (GDA), but it is unclear whether the optimization problem contains any saddle points, or whether heuristic methods can find them in practice. In this work, we analyze the training of Wasserstein GANs with two-layer neural network discriminators through the lens of convex duality, and for a variety of generators expose the conditions under which Wasserstein GANs can be solved exactly with convex optimization approaches, or can be represented as convex-concave games. Using this convex duality interpretation, we further demonstrate the impact of different activation functions of the discriminator. Our observations are verified with numerical results demonstrating the power of the convex interpretation, with applications in progressive training of convex architectures corresponding to linear generators and quadratic-activation discriminators for CelebA image generation.


Recent advances in Bayesian optimization with applications to parameter reconstruction in optical nano-metrology

arXiv.org Machine Learning

Parameter reconstruction is a common problem in optical nano metrology. It generally involves a set of measurements, to which one attempts to fit a numerical model of the measurement process. The model evaluation typically involves to solve Maxwell's equations and is thus time consuming. This makes the reconstruction computationally demanding. Several methods exist for fitting the model to the measurements. On the one hand, Bayesian optimization methods for expensive black-box optimization enable an efficient reconstruction by training a machine learning model of the squared sum of deviations. On the other hand, curve fitting algorithms, such as the Levenberg-Marquardt method, take the deviations between all model outputs and corresponding measurement values into account which enables a fast local convergence. In this paper we present a Bayesian Target Vector Optimization scheme which combines these two approaches. We compare the performance of the presented method against a standard Levenberg-Marquardt-like algorithm, a conventional Bayesian optimization scheme, and the L-BFGS-B and Nelder-Mead simplex algorithms. As a stand-in for problems from nano metrology, we employ a non-linear least-square problem from the NIST Standard Reference Database. We find that the presented method generally uses fewer calls of the model function than any of the competing schemes to achieve similar reconstruction performance.


Least-Squares Linear Dilation-Erosion Regressor Trained using Stochastic Descent Gradient or the Difference of Convex Methods

arXiv.org Artificial Intelligence

This paper presents a hybrid morphological neural network for regression tasks called linear dilation-erosion regression ($\ell$-DER). In few words, an $\ell$-DER model is given by a convex combination of the composition of linear and elementary morphological operators. As a result, they yield continuous piecewise linear functions and, thus, are universal approximators. Apart from introducing the $\ell$-DER models, we present three approaches for training these models: one based on stochastic descent gradient and two based on the difference of convex programming problems. Finally, we evaluate the performance of the $\ell$-DER model using 14 regression tasks. Although the approach based on SDG revealed faster than the other two, the $\ell$-DER trained using a disciplined convex-concave programming problem outperformed the others in terms of the least mean absolute error score.


Locality Relationship Constrained Multi-view Clustering Framework

arXiv.org Artificial Intelligence

In most practical applications, it's common to utilize multiple features from different views to represent one object. Among these works, multi-view subspace-based clustering has gained extensive attention from many researchers, which aims to provide clustering solutions to multi-view data. However, most existing methods fail to take full use of the locality geometric structure and similarity relationship among samples under the multi-view scenario. To solve these issues, we propose a novel multi-view learning method with locality relationship constraint to explore the problem of multi-view clustering, called Locality Relationship Constrained Multi-view Clustering Framework (LRC-MCF). LRC-MCF aims to explore the diversity, geometric, consensus and complementary information among different views, by capturing the locality relationship information and the common similarity relationships among multiple views. Moreover, LRC-MCF takes sufficient consideration to weights of different views in finding the common-view locality structure and straightforwardly produce the final clusters. To effectually reduce the redundancy of the learned representations, the low-rank constraint on the common similarity matrix is considered additionally. To solve the minimization problem of LRC-MCF, an Alternating Direction Minimization (ADM) method is provided to iteratively calculate all variables LRC-MCF. Extensive experimental results on seven benchmark multi-view datasets validate the effectiveness of the LRC-MCF method.


Cluster Regularization via a Hierarchical Feature Regression

arXiv.org Machine Learning

Prediction tasks with high-dimensional nonorthogonal predictor sets pose a challenge for least squares based fitting procedures. A large and productive literature exists, discussing various regularized approaches to improving the out-of-sample robustness of parameter estimates. This paper proposes a novel cluster-based regularization -- the hierarchical feature regression (HFR) --, which mobilizes insights from the domains of machine learning and graph theory to estimate parameters along a supervised hierarchical representation of the predictor set, shrinking parameters towards group targets. The method is innovative in its ability to estimate optimal compositions of predictor groups, as well as the group targets endogenously. The HFR can be viewed as a supervised factor regression, with the strength of shrinkage governed by a penalty on the extent of idiosyncratic variation captured in the fitting process. The method demonstrates good predictive accuracy and versatility, outperforming a panel of benchmark regularized estimators across a diverse set of simulated regression tasks, including dense, sparse and grouped data generating processes. An application to the prediction of economic growth is used to illustrate the HFR's effectiveness in an empirical setting, with favorable comparisons to several frequentist and Bayesian alternatives.


US Air Force pilots get an artificial intelligence assist with scheduling aircrews

#artificialintelligence

Take it from U.S. Air Force Captain Kyle McAlpin when he says that scheduling C-17 aircraft crews is a headache. An artificial intelligence research flight commander for the Department of Air Force–MIT AI Accelerator Program, McAlpin is also an experienced C-17 pilot. "You could have a mission change and spend the next 12 hours of your life rebuilding a schedule that works," he says. It's a pain point for crew of 52 squadrons who operate C-17s, the military cargo aircraft that transport troops and supplies globally. This year, the Air Force marked 4 million flight hours for its C-17 fleet, which comprises 275 U.S. and allied aircraft.


The Bayesian Learning Rule

arXiv.org Machine Learning

We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.


Reinforced Hybrid Genetic Algorithm for the Traveling Salesman Problem

arXiv.org Artificial Intelligence

Given a set of cities with certain locations, the Traveling Salesman Problem (TSP) is to find the shortest Hamiltonian route, along which a salesman travels from a city to visit all the cities exactly once and finally returns to the starting city. The TSP is one of the most famous and well-studied NP-hard combinatorial optimization problems, which is very easy to understand but very difficult to solve optimally or near-optimally. Over the years, TSP has become a touchstone for the algorithm design. Typical methods for solving the TSP are mainly exact algorithms, approximation algorithms and heuristics. The exact algorithms may be prohibitive for large instances and the approximation algorithms may suffer from weak optimal guarantees or empirical performance (Khalil et al. 2017). Heuristics are known to be the most efficient and effective approaches for solving the TSP.


Training Over-parameterized Models with Non-decomposable Objectives

arXiv.org Artificial Intelligence

Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints. Popular techniques for optimizing such non-decomposable objectives reduce the problem into a sequence of cost-sensitive learning tasks, each of which is then solved by re-weighting the training loss with example-specific costs. We point out that the standard approach of re-weighting the loss to incorporate label costs can produce unsatisfactory results when used to train over-parameterized models. As a remedy, we propose new cost-sensitive losses that extend the classical idea of logit adjustment to handle more general cost matrices. Our losses are calibrated, and can be further improved with distilled labels from a teacher model. Through experiments on benchmark image datasets, we showcase the effectiveness of our approach in training ResNet models with common robust and constrained optimization objectives.