Optimization
RKHSMetaMod : An R package to estimate the Hoeffding decomposition of an unknown function by solving RKHS Ridge Group Sparse optimization problem
Kamari, Halaleh, Huet, Sylvie, Taupin, Marie-Luce
In the context of the Gaussian regression model, RKHSMetaMod estimates a meta model by solving the Ridge Group Sparse Optimization Problem based on the Reproducing Kernel Hilbert Spaces (RKHS). The estimated meta model is an additive model that satisfies the properties of the Hoeffding decomposition, and its terms estimate the terms in the Hoeffding decomposition of the unkwown regression function. This package provides an interface from R statistical computing environment to the C++ libraries Eigen and GSL. It uses the efficient C++ functions through RcppEigen and RcppGSL packages to speed up the execution time and the R environment in order to propose an user friendly package.
A General Optimization Framework for Dynamic Time Warping
The goal of dynamic time warping is to transform or warp time in order to approximately align two signals together. We pose the choice of warping function as an optimization problem with several terms in the objective. The first term measures the misalignment of the time-warped signals. Two additional regularization terms penalize the cumulative warping and the instantaneous rate of time warping; constraints on the warping can be imposed by assigning the value +inf to the regularization terms. Different choices of the three objective terms yield different time warping functions that trade off signal fit or alignment and properties of the warping function. The optimization problem we formulate is a classical optimal control problem, with initial and terminal constraints, and a state dimension of one. We describe an effective general method that minimizes the objective by discretizing the values of the original and warped time, and using standard dynamic programming to compute the (globally) optimal warping function with the discretized values. Iterated refinement of this scheme yields a high accuracy warping function in just a few iterations. Our method is implemented as an open source Python package GDTW.
A novel hybrid model based on multi-objective Harris hawks optimization algorithm for daily PM2.5 and PM10 forecasting
Du, Pei, Wang, Jianzhou, Hao, Yan, Niu, Tong, Yang, Wendong
High levels of air pollution may seriously affect people's living environment and even endanger their lives. In order to reduce air pollution concentrations, and warn the public before the occurrence of hazardous air pollutants, it is urgent to design an accurate and reliable air pollutant forecasting model. However, most previous research have many deficiencies, such as ignoring the importance of predictive stability, and poor initial parameters and so on, which have significantly effect on the performance of air pollution prediction. Therefore, to address these issues, a novel hybrid model is proposed in this study. Specifically, a powerful data preprocessing techniques is applied to decompose the original time series into different modes from low- frequency to high- frequency. Next, a new multi-objective algorithm called MOHHO is first developed in this study, which are introduced to tune the parameters of ELM model with high forecasting accuracy and stability for air pollution series prediction, simultaneously. And the optimized ELM model is used to perform the time series prediction. Finally, a scientific and robust evaluation system including several error criteria, benchmark models, and several experiments using six air pollutant concentrations time series from three cities in China is designed to perform a compressive assessment for the presented hybrid forecasting model. Experimental results indicate that the proposed hybrid model can guarantee a more stable and higher predictive performance compared to others, whose superior prediction ability may help to develop effective plans for air pollutant emissions and prevent health problems caused by air pollution.
Deep Bayesian Optimization on Attributed Graphs
Attributed graphs, which contain rich contextual features beyond just network structure, are ubiquitous and have been observed to benefit various network analytics applications. Graph structure optimization, aiming to find the optimal graphs in terms of some specific measures, has become an effective computational tool in complex network analysis. However, traditional model-free methods suffer from the expensive computational cost of evaluating graphs; existing vectorial Bayesian optimization methods cannot be directly applied to attributed graphs and have the scalability issue due to the use of Gaussian processes (GPs). To bridge the gap, in this paper, we propose a novel scalable Deep Graph Bayesian Optimization (DGBO) method on attributed graphs. The proposed DGBO prevents the cubical complexity of the GPs by adopting a deep graph neural network to surrogate black-box functions, and can scale linearly with the number of observations. Intensive experiments are conducted on both artificial and real-world problems, including molecular discovery and urban road network design, and demonstrate the effectiveness of the DGBO compared with the state-of-the-art.
Data-driven Algorithm Selection and Parameter Tuning: Two Case studies in Optimization and Signal Processing
De Loera, Jesus A., Haddock, Jamie, Ma, Anna, Needell, Deanna
Machine learning algorithms typically rely on optimization subroutines and are well-known to provide very effective outcomes for many types of problems. Here, we flip the reliance and ask the reverse question: can machine learning algorithms lead to more effective outcomes for optimization problems? Our goal is to train machine learning methods to automatically improve the performance of optimization and signal processing algorithms. As a proof of concept, we use our approach to improve two popular data processing subroutines in data science: stochastic gradient descent and greedy methods in compressed sensing. We provide experimental results that demonstrate the answer is ``yes'', machine learning algorithms do lead to more effective outcomes for optimization problems, and show the future potential for this research direction.
Generalized Separable Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a linear dimensionality technique for nonnegative data with applications such as image analysis, text mining, audio source separation and hyperspectral unmixing. Given a data matrix $M$ and a factorization rank $r$, NMF looks for a nonnegative matrix $W$ with $r$ columns and a nonnegative matrix $H$ with $r$ rows such that $M \approx WH$. NMF is NP-hard to solve in general. However, it can be computed efficiently under the separability assumption which requires that the basis vectors appear as data points, that is, that there exists an index set $\mathcal{K}$ such that $W = M(:,\mathcal{K})$. In this paper, we generalize the separability assumption: We only require that for each rank-one factor $W(:,k)H(k,:)$ for $k=1,2,\dots,r$, either $W(:,k) = M(:,j)$ for some $j$ or $H(k,:) = M(i,:)$ for some $i$. We refer to the corresponding problem as generalized separable NMF (GS-NMF). We discuss some properties of GS-NMF and propose a convex optimization model which we solve using a fast gradient method. We also propose a heuristic algorithm inspired by the successive projection algorithm. To verify the effectiveness of our methods, we compare them with several state-of-the-art separable NMF algorithms on synthetic, document and image data sets.
Meta-Surrogate Benchmarking for Hyperparameter Optimization
Klein, Aaron, Dai, Zhenwen, Hutter, Frank, Lawrence, Neil, Gonzalez, Javier
Despite the recent progress in hyperparameter optimization (HPO), available benchmarks that resemble real-world scenarios consist of a few and very large problem instances that are expensive to solve. This blocks researchers and practitioners not only from systematically running large-scale comparisons that are needed to draw statistically significant results but also from reproducing experiments that were conducted before. This work proposes a method to alleviate these issues by means of a meta-surrogate model for HPO tasks trained on off-line generated data. The model combines a probabilistic encoder with a multi-task model such that it can generate inexpensive and realistic tasks of the class of problems of interest. We demonstrate that benchmarking HPO methods on samples of the generative model allows us to draw more coherent and statistically significant conclusions that can be reached orders of magnitude faster than using the original tasks. We provide evidence of our findings for various HPO methods on a wide class of problems.
Semi-Implicit Generative Model
Yin, Mingzhang, Zhou, Mingyuan
To combine explicit and implicit generative models, we introduce semi-implicit generator (SIG) as a flexible hierarchical model that can be trained in the maximum likelihood framework. Both theoretically and experimentally, we demonstrate that SIG can generate high quality samples especially when dealing with multi-modality. By introducing SIG as an unbiased regularizer to the generative adversarial network (GAN), we show the interplay between maximum likelihood and adversarial learning can stabilize the adversarial training, resist the notorious mode collapsing problem of GANs, and improve the diversity of generated random samples.
Don't Forget Your Teacher: A Corrective Reinforcement Learning Framework
Nazari, Mohammadreza, Jahani, Majid, Snyder, Lawrence V., Takรกฤ, Martin
Although reinforcement learning (RL) can provide reliable solutions in many settings, practitioners are often wary of the discrepancies between the RL solution and their status quo procedures. Therefore, they may be reluctant to adapt to the novel way of executing tasks proposed by RL. On the other hand, many real-world problems require relatively small adjustments from the status quo policies to achieve improved performance. Therefore, we propose a student-teacher RL mechanism in which the RL (the "student") learns to maximize its reward, subject to a constraint that bounds the difference between the RL policy and the "teacher" policy. The teacher can be another RL policy (e.g., trained under a slightly different setting), the status quo policy, or any other exogenous policy. We formulate this problem using a stochastic optimization model and solve it using a primal-dual policy gradient algorithm. We prove that the policy is asymptotically optimal. However, a naive implementation suffers from high variance and convergence to a stochastic optimal policy. With a few practical adjustments to address these issues, our numerical experiments confirm the effectiveness of our proposed method in multiple GridWorld scenarios.
Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation
Ahn, Byung Hoon, Pilligundla, Prannoy, Esmaeilzadeh, Hadi
Achieving faster execution with shorter compilation time can enable further diversity and innovation in neural networks. However, the current paradigm of executing neural networks either relies on hand-optimized libraries, traditional compilation heuristics, or very recently, simulated annealing and genetic algorithms. Our work takes a unique approach by formulating compiler optimizations for neural networks as a reinforcement learning problem, whose solution takes fewer steps to converge. This solution, dubbed ReLeASE, comes with a sampling algorithm that leverages clustering to focus the costly samples (real hardware measurements) on representative points, subsuming an entire subspace. Our adaptive sampling not only reduces the number of samples, but also improves the quality of samples for better exploration in shorter time. As such, experimentation with real hardware shows that reinforcement learning with adaptive sampling provides 4.45x speed up in optimization time over AutoTVM, while also improving inference time of the modern deep networks by 5.6%. Further experiments also confirm that our adaptive sampling can even improve AutoTVM's simulated annealing by 4.00x.