Optimization
A Bayesian Perspective of Statistical Machine Learning for Big Data
Sambasivan, Rajiv, Das, Sourish, Sahu, Sujit K
Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data is essentially the meaning of the keyword `learning' in SML. Theoretical justifications for the effectiveness of the SML algorithms are underpinned by sound principles from different disciplines, such as Computer Science and Statistics. The theoretical underpinnings particularly justified by statistical inference methods are together termed as statistical learning theory. This paper provides a review of SML from a Bayesian decision theoretic point of view -- where we argue that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm. We discuss many important SML techniques such as supervised and unsupervised learning, deep learning, online learning and Gaussian processes especially in the context of very large data sets where these are often employed. We present a dictionary which maps the key concepts of SML from Computer Science and Statistics. We illustrate the SML techniques with three moderately large data sets where we also discuss many practical implementation issues. Thus the review is especially targeted at statisticians and computer scientists who are aspiring to understand and apply SML for moderately large to big data sets.
Global sensitivity analysis for optimization with variable selection
Spagnol, Adrien, Riche, Rodolphe Le, Da Veiga, Sebastien
The optimization of high dimensional functions is a key issue in engineering problems but it frequently comes at a cost that is not acceptable since it usually involves a complex and expensive computer code. Engineers often overcome this limitation by first identifying which parameters drive the most the function variations: non-influential variables are set to a fixed value and the optimization procedure is carried out with the remaining influential variables. Such variable selection is performed through influence measures that are meaningful for regression problems. However it does not account for the specific structure of optimization problems where we would like to identify which variables most lead to constraints satisfaction and low values of the objective function. In this paper, we propose a new sensitivity analysis that accounts for the specific aspects of optimization problems. In particular, we introduce an influence measure based on the Hilbert-Schmidt Independence Criterion to characterize whether a design variable matters to reach low values of the objective function and to satisfy the constraints. This sensitivity measure makes it possible to sort the inputs and reduce the problem dimension. We compare a random and a greedy strategies to set the values of the non-influential variables before conducting a local optimization. Applications to several test-cases show that this variable selection and the greedy strategy significantly reduce the number of function evaluations at a limited cost in terms of solution performance.
Task Embedded Coordinate Update: A Realizable Framework for Multivariate Non-convex Optimization
Wang, Yiyang, Liu, Risheng, Ma, Long, Song, Xiaoliang
We in this paper propose a realizable framework TECU, which embeds task-specific strategies into update schemes of coordinate descent, for optimizing multivariate non-convex problems with coupled objective functions. On one hand, TECU is capable of improving algorithm efficiencies through embedding productive numerical algorithms, for optimizing univariate sub-problems with nice properties. From the other side, it also augments probabilities to receive desired results, by embedding advanced techniques in optimizations of realistic tasks. Integrating both numerical algorithms and advanced techniques together, TECU is proposed in a unified framework for solving a class of non-convex problems. Although the task embedded strategies bring inaccuracies in sub-problem optimizations, we provide a realizable criterion to control the errors, meanwhile, to ensure robust performances with rigid theoretical analyses. By respectively embedding ADMM and a residual-type CNN in our algorithm framework, the experimental results verify both efficiency and effectiveness of embedding task-oriented strategies in coordinate descent for solving practical problems.
Deterministic and stochastic inexact regularization algorithms for nonconvex optimization with optimal complexity
Bellavia, S., Gurioli, G., Morini, B., Toint, Ph. L.
A regularization algorithm using inexact function values and inexact derivatives is proposed and its evaluation complexity analyzed. This algorithm is applicable to unconstrained problems and to problems with inexpensive constraints (that is constraints whose evaluation and enforcement has negligible cost) under the assumption that the derivative of highest degree is $\beta$-H\"{o}lder continuous. It features a very flexible adaptive mechanism for determining the inexactness which is allowed, at each iteration, when computing objective function values and derivatives. The complexity analysis covers arbitrary optimality order and arbitrary degree of available approximate derivatives. It extends results of Cartis, Gould and Toint (2018) on the evaluation complexity to the inexact case: if a $q$th order minimizer is sought using approximations to the first $p$ derivatives, it is proved that a suitable approximate minimizer within $\epsilon$ is computed by the proposed algorithm in at most $O(\epsilon^{-\frac{p+\beta}{p-q+\beta}})$ iterations and at most $O(|\log(\epsilon)|\epsilon^{-\frac{p+\beta}{p-q+\beta}})$ approximate evaluations. While the proposed framework remains so far conceptual for high degrees and orders, it is shown to yield simple and computationally realistic inexact methods when specialized to the unconstrained and bound-constrained first- and second-order cases. The deterministic complexity results are finally extended to the stochastic context, yielding adaptive sample-size rules for subsampling methods typical of machine learning.
Fast Matrix Factorization with Non-Uniform Weights on Missing Data
He, Xiangnan, Tang, Jinhui, Du, Xiaoyu, Hong, Richang, Ren, Tongwei, Chua, Tat-Seng
Abstract--Matrix factorization (MF) has been widely used to discover the low-rank structure and to predict the missing entries of data matrix. In many real-world learning systems, the data matrix can be very high-dimensional but sparse. This poses an imbalanced learning problem, since the scale of missing entries is usually much larger than that of observed entries, but they cannot be ignored due to the valuable negative signal. For efficiency concern, existing work typically applies a uniform weight on missing entries to allow a fast learning algorithm. However, this simplification will decrease modeling fidelity, resulting in suboptimal performance for downstream applications. In this work, we weight the missing data non-uniformly, and more generically, we allow any weighting strategy on the missing data. To address the efficiency challenge, we propose a fast learning method, for which the time complexity is determined by the number of observed entries in the data matrix, rather than the matrix size. The key idea is twofold: 1) we apply truncated SVD on the weight matrix to get a more compact representation of the weights, and 2) we learn MF parameters with element-wise alternating least squares (eALS) and memorize the key intermediate variables to avoid repeating computations that are unnecessary. We conduct extensive experiments on two recommendation benchmarks, demonstrating the correctness, efficiency, and effectiveness of our fast eALS method. Atrices are a common data structure to represent the relation between two types of entities in learning systems [1]-[3]. In relational learning, matrix factorization (MF) is a popular approach for dimension reduction by representing the rows (entities of one type) and columns (entities of another type) as two low-rank matrices. The optimization of dimension reduction is usually achieved by minimizing the reconstruction error between the low-rank model and the original data. Xiangnan He and Tat-Seng Chua are with the School of Computing, National University of Singapore, Singapore, 117417. Jinhui Tang is with the School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China, 210094.
Time-interval balancing in multi-processor scheduling of composite modular jobs (preliminary description)
The article describes a special time-interval balancing in multi-processor scheduling of composite modular jobs. This scheduling problem is close to just-in-time planning approach. First, brief literature surveys are presented on just-in-time scheduling and due-data/due-window scheduling problems. Further, the problem and its formulation are proposed for the time-interval balanced scheduling of composite modular jobs. The illustrative real world planning example for modular home-building is described. Here, the main objective function consists in a balance between production of the typical building modules (details) and the assembly processes of the building(s) (by several teams). The assembly plan has to be modified to satisfy the balance requirements. The solving framework is based on the following: (i) clustering of initial set of modular detail types to obtain about ten basic detail types that correspond to main manufacturing conveyors; (ii) designing a preliminary plan of assembly for buildings; (iii) detection of unbalanced time periods, (iv) modification of the planning solution to improve the schedule balance. The framework implements a metaheuristic based on local optimization approach. Two other applications (supply chain management, information transmission systems) are briefly described.
Targeting Solutions in Bayesian Multi-Objective Optimization: Sequential and Parallel Versions
Gaudrie, David, Riche, Rodolphe Le, Picheny, Victor, Enaux, Benoit, Herbert, Vincent
Multi-objective optimization aims at finding trade-off solutions to conflicting objectives. These constitute the Pareto optimal set. In the context of expensive-to-evaluate functions, it is impossible and often non-informative to look for the entire set. As an end-user would typically prefer a certain part of the objective space, we modify the Bayesian multi-objective optimization algorithm which uses Gaussian Processes to maximize the Expected Hypervolume Improvement, to focus the search in the preferred region. The cumulated effects of the Gaussian Processes and the targeting strategy lead to a particularly efficient convergence to the desired part of the Pareto set. To take advantage of parallel computing, a multi-point extension of the targeting criterion is proposed and analyzed.
Reachability-based safe learning for optimal control problem
Fedorov, Stanislav, Candelieri, Antonio
In this work we seek for an approach to integrate safety in the learning process that relies on a partly known state-space model of the system and regards the unknown dynamics as an additive bounded disturbance. We introduce a framework for safely learning a control strategy for a given system with an additive disturbance. On the basis of the known part of the model, a safe set in which the system can learn safely, the algorithm can choose optimal actions for pursuing the target set as long as the safety-preserving condition is satisfied. After some learning episodes, the disturbance can be updated based on real-world data. To this end, Gaussian Process regression is conducted on the collected disturbance samples. Since the unstable nature of the law of the real world, for example, change of friction or conductivity with the temperature, we expect to have the more robust solution of optimal control problem. For evaluation of approach described above we choose an inverted pendulum as a benchmark model. The proposed algorithm manages to learn a policy that does not violate the pre-specified safety constraints. Observed performance is improved when it was incorporated exploration set up to make sure that an optimal policy is learned everywhere in the safe set. Finally, we outline some promising directions for future research beyond the scope of this paper.
Suggesting Cooking Recipes Through Simulation and Bayesian Optimization
Garrido-Merchán, Eduardo C., Albarca-Molina, Alejandro
Cooking typically involves a plethora of decisions about ingredients and tools that need to be chosen in order to write a good cooking recipe. Cooking can be modelled in an optimization framework, as it involves a search space of ingredients, kitchen tools, cooking times or temperatures. If we model as an objective function the quality of the recipe, several problems arise. No analytical expression can model all the recipes, so no gradients are available. The objective function is subjective, in other words, it contains noise. Moreover, evaluations are expensive both in time and human resources. Bayesian Optimization (BO) emerges as an ideal methodology to tackle problems with these characteristics. In this paper, we propose a methodology to suggest recipe recommendations based on a Machine Learning (ML) model that fits real and simulated data and BO. We provide empirical evidence with two experiments that support the adequacy of the methodology.
Using Known Information to Accelerate HyperParameters Optimization Based on SMBO
Daning, Cheng, Hanping, Zhang, Fen, Xia, Shigang, Li, Yunquan, Zhang
Automl is the key technology for machine learning problem. Current state of art hyperparameter optimization methods are based on traditional black-box optimization methods like SMBO (SMAC, TPE). The objective function of black-box optimization is non-smooth, or time-consuming to evaluate, or in some way noisy. Recent years, many researchers offered the work about the properties of hyperparameters. However, traditional hyperparameter optimization methods do not take those information into consideration. In this paper, we use gradient information and machine learning model analysis information to accelerate traditional hyperparameter optimization methods SMBO. In our L2 norm experiments, our method yielded state-of-the-art performance, and in many cases outperformed the previous best configuration approach.