Optimization
A new approach to forecast service parts demand by integrating user preferences into multi-objective optimization
Service supply chain management is to prepare spare parts for failed products under warranty. Their goal is to reach agreed service level at the minimum cost. We convert this business problem into a preference based multi-objective optimization problem, where two quality criteria must be simultaneously optimized. One criterion is accuracy of demand forecast and the other is service level. Here we propose a general framework supporting solving preference-based multi-objective optimization problems (MOPs) by multi-gradient descent algorithm (MGDA), which is well suited for training deep neural network. The proposed framework treats agreed service level as a constrained criterion that must be met and generate a Pareto-optimal solution with highest forecasting accuracy. The neural networks used here are two Encoder-Decoder LSTM modes: one is used for pre-training phase to learn distributed representation of former generations' service parts consumption data, and the other is used for supervised learning phase to generate forecast quantities of current generations' service parts. Evaluated under the service parts consumption data in Lenovo Group Ltd, the proposed method clearly outperform baseline methods.
Locally Accelerated Conditional Gradients
Carderera, Alejandro, Diakonikolas, Jelena, Pokutta, Sebastian
Conditional gradient methods form a class of projection-free first-order algorithms for solving smooth convex optimization problems. Apart from eschewing projections, these methods are attractive because of their simplicity, numerical performance, and the sparsity of the solutions outputted. However, they do not achieve optimal convergence rates. We present the Locally Accelerated Conditional Gradients algorithm that relaxes the projection-freeness requirement to only require projection onto (typically low-dimensional) simplices and mixes accelerated steps with conditional gradient steps to achieve local acceleration. We derive asymptotically optimal convergence rates for this algorithm. Our experimental results demonstrate the practicality of our approach; in particular, the speedup is achieved both in wall-clock time and per-iteration progress compared to standard conditional gradient methods and a Catalyst-accelerated Away-Step Frank-Wolfe algorithm.
Hill Climbing on Value Estimates for Search-control in Dyna
Pan, Yangchen, Yao, Hengshuai, Farahmand, Amir-massoud, White, Martha
Dyna is an architecture for model-based reinforcement learning (RL), where simulated experience from a model is used to update policies or value functions. A key component of Dyna is search-control, the mechanism to generate the state and action from which the agent queries the model, which remains largely unexplored. In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function. This has the effect of propagating value from high-value regions and of preemptively updating value estimates of the regions that the agent is likely to visit next. We derive a noisy stochastic projected gradient ascent algorithm for hill climbing, and highlight a connection to Langevin dynamics. We provide an empirical demonstration on four classical domains that our algorithm, HC-Dyna, can obtain significant sample efficiency improvements. We study the properties of different sampling distributions for search-control, and find that there appears to be a benefit specifically from using the samples generated by climbing on current value estimates from low-value to high-value region.
Evaluating Ising Processing Units with Integer Programming
Coffrin, Carleton, Nagarajan, Harsha, Bent, Russell
The recent emergence of novel computational devices, such as adiabatic quantum computers, CMOS annealers, and optical parametric oscillators, present new opportunities for hybrid-optimization algorithms that are hardware accelerated by these devices. In this work, we propose the idea of an Ising processing unit as a computational abstraction for reasoning about these emerging devices. The challenges involved in using and benchmarking these devices are presented and commercial mixed integer programming solvers are proposed as a valuable tool for the validation of these disparate hardware platforms. The proposed validation methodology is demonstrated on a D-Wave 2X adiabatic quantum computer, one example of an Ising processing unit. The computational results demonstrate that the D-Wave hardware consistently produces high-quality solutions and suggests that as IPU technology matures it could become a valuable co-processor in hybrid-optimization algorithms.
Asymptotic Risk of Bezier Simplex Fitting
Tanaka, Akinori, Sannai, Akiyoshi, Kobayashi, Ken, Hamada, Naoki
The Bezier simplex fitting is a novel data modeling technique which exploits geometric structures of data to approximate the Pareto front of multi-objective optimization problems. There are two fitting methods based on different sampling strategies. The inductive skeleton fitting employs a stratified subsampling from each skeleton of a simplex, whereas the all-at-once fitting uses a non-stratified sampling which treats a simplex as a whole. In this paper, we analyze the asymptotic risks of those B\'ezier simplex fitting methods and derive the optimal subsample ratio for the inductive skeleton fitting. It is shown that the inductive skeleton fitting with the optimal ratio has a smaller risk when the degree of a Bezier simplex is less than three. Those results are verified numerically under small to moderate sample sizes. In addition, we provide two complementary applications of our theory: a generalized location problem and a multi-objective hyper-parameter tuning of the group lasso. The former can be represented by a Bezier simplex of degree two where the inductive skeleton fitting outperforms. The latter can be represented by a Bezier simplex of degree three where the all-at-once fitting gets an advantage.
Inspirational Adversarial Image Generation
Riviere, Morgane, Teytaud, Olivier, Rapin, Jérémy, LeCun, Yann, Couprie, Camille
The task of image generation started to receive some attention from artists and designers to inspire them in new creations. However, exploiting the results of deep generative models such as Generative Adversarial Networks can be long and tedious given the lack of existing tools. In this work, we propose a simple strategy to inspire creators with new generations learned from a dataset of their choice, while providing some control on them. We design a simple optimization method to find the optimal latent parameters corresponding to the closest generation to any input inspirational image. Specifically, we allow the generation given an inspirational image of the user choice by performing several optimization steps to recover optimal parameters from the model's latent space. We tested several exploration methods starting with classic gradient descents to gradient-free optimizers. Many gradient-free optimizers just need comparisons (better/worse than another image), so that they can even be used without numerical criterion, without inspirational image, but with only with human preference. Thus, by iterating on one's preferences we could make robust Facial Composite or Fashion Generation algorithms. High resolution of the produced design generations are obtained using progressive growing of GANs. Our results on four datasets of faces, fashion images, and textures show that satisfactory images are effectively retrieved in most cases.
Bayesian Optimization with Binary Auxiliary Information
Zhang, Yehong, Dai, Zhongxiang, Low, Kian Hsiang
This paper presents novel mixed-type Bayesian optimization (BO) algorithms to accelerate the optimization of a target objective function by exploiting correlated auxiliary information of binary type that can be more cheaply obtained, such as in policy search for reinforcement learning and hyperparameter tuning of machine learning models with early stopping. To achieve this, we first propose a mixed-type multi-output Gaussian process (MOGP) to jointly model the continuous target function and binary auxiliary functions. Then, we propose information-based acquisition functions such as mixed-type entropy search (MT-ES) and mixed-type predictive ES (MT-PES) for mixed-type BO based on the MOGP predictive belief of the target and auxiliary functions. The exact acquisition functions of MT-ES and MT-PES cannot be computed in closed form and need to be approximated. We derive an efficient approximation of MT-PES via a novel mixed-type random features approximation of the MOGP model whose cross-correlation structure between the target and auxiliary functions can be exploited for improving the belief of the global target maximizer using observations from evaluating these functions. We propose new practical constraints to relate the global target maximizer to the binary auxiliary functions. We empirically evaluate the performance of MT-ES and MT-PES with synthetic and real-world experiments.
Improving Black-box Adversarial Attacks with a Transfer-based Prior
Cheng, Shuyu, Dong, Yinpeng, Pang, Tianyu, Su, Hang, Zhu, Jun
We consider the black-box adversarial setting, where the adversary has to generate adversarial perturbations without access to the target models to compute gradients. Previous methods tried to approximate the gradient either by using a transfer gradient of a surrogate white-box model, or based on the query feedback. However, these methods often suffer from low attack success rates or poor query efficiency since it is non-trivial to estimate the gradient in a high-dimensional space with limited information. To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously. The transfer-based prior given by the gradient of a surrogate model is appropriately integrated into our algorithm by an optimal coefficient derived by a theoretical analysis. Extensive experiments demonstrate that our method requires much fewer queries to attack black-box models with higher success rates compared with the alternative state-of-the-art methods.
What is Quantum Computing and How is it Useful for Artificial Intelligence?
After decades of a heavy slog with no promise of success, quantum computing is suddenly buzzing! Nearly two years ago, IBM made a quantum computer available to the world. The 5-quantum-bit (qubit) resource they now call the IBM Q experience. It was more like a toy for researchers than a way of getting any serious number crunching done. But 70,000 users worldwide have registered for it, and the qubit count in this resource has now quadrupled.
A concise guide to existing and emerging vehicle routing problem variants
Vidal, Thibaut, Laporte, Gilbert, Matl, Piotr
Vehicle routing problems have been the focus of extensive research over the past sixty years, driven by their economic importance and their theoretical interest. The diversity of applications has motivated the study of a myriad of problem variants with different attributes. In this article, we provide a brief survey of existing and emerging problem variants. Models are typically refined along three lines: considering more relevant objectives and performance metrics, integrating vehicle routing evaluations with other tactical decisions, and capturing fine-grained yet essential aspects of modern supply chains. We organize the main problem attributes within this structured framework. We discuss recent research directions and pinpoint current shortcomings, recent successes, and emerging challenges.