Optimization
Maximally Invariant Data Perturbation as Explanation
Hara, Satoshi, Ikeno, Kouichi, Soma, Tasuku, Maehara, Takanori
While several feature scoring methods are proposed to explain the output of complex machine learning models, most of them lack formal mathematical definitions. In this study, we propose a novel definition of the feature score using the maximally invariant data perturbation, which is inspired from the idea of adversarial example. In adversarial example, one seeks the smallest data perturbation that changes the model's output. In our proposed approach, we consider the opposite: we seek the maximally invariant data perturbation that does not change the model's output. In this way, we can identify important input features as the ones with small allowable data perturbations. To find the maximally invariant data perturbation, we formulate the problem as linear programming. The experiment on the image classification with VGG16 shows that the proposed method could identify relevant parts of the images effectively.
Moving Objects Analytics: Survey on Future Location & Trajectory Prediction Methods
Georgiou, Harris, Karagiorgou, Sophia, Kontoulis, Yannis, Pelekis, Nikos, Petrou, Petros, Scarlatti, David, Theodoridis, Yannis
Nowadays, huge amounts of tracking data in the mobility domain are being generated by Global Positioning System (GPS) enabled devices and collected in data repositories; tracked moving entities could be pedestrians, cars, vessels, planes, animals, robots, etc. These datasets constitute a rich source for inferring mobility patterns and characteristics for a wide spectrum of novel applications and services, from social networking applications [5][46] to aviation traffic monitoring [61][67]. During the recent years, this kind of information has attracted great interest by data scientists, both in industry and in academia, and is being used in order to extract useful knowledge about what, how and for how long the moving entities are conducting individual activities related with specific circumstances. The most challenging task is to make this information actionable, by means of exploiting historical mobility patterns in order to gauge how the moving entities may evolve in short-or long-term, whether the individual forecasted movement is typical or anomalous, whether there exists a high probability for congestion in the near future, etc. As a consequence, predictive analytics over mobility data has become increasingly important and turns out to be a'hot' field in several application domains [4][74][111]. The problem of predictive analytics over mobility data finds two broad categories of application scenarios. The first scenario involves cases where the moving entities are traced in real-time to produce analytics and compute short-term predictions, which are time-critical and need immediate response. The prediction includes either location-or trajectory-related tasks.
An Empirical Approach For Probing the Definiteness of Kernels
Zaefferer, Martin, Bartz-Beielstein, Thomas, Rudolph, Gรผnter
Models like support vector machines or Gaussian process regression often require positive semi-definite kernels. These kernels may be based on distance functions. While definiteness is proven for common distances and kernels, a proof for a new kernel may require too much time and effort for users who simply aim at practical usage. Furthermore, designing definite distances or kernels may be equally intricate. Finally, models can be enabled to use indefinite kernels. This may deteriorate the accuracy or computational cost of the model. Hence, an efficient method to determine definiteness is required. We propose an empirical approach. We show that sampling as well as optimization with an evolutionary algorithm may be employed to determine definiteness. We provide a proof-of-concept with 16 different distance measures for permutations. Our approach allows to disprove definiteness if a respective counter-example is found. It can also provide an estimate of how likely it is to obtain indefinite kernel matrices. This provides a simple, efficient tool to decide whether additional effort should be spent on designing/selecting a more suitable kernel or algorithm.
Dual optimization for convex constrained objectives without the gradient-Lipschitz assumption
Bompaire, Martin, Bacry, Emmanuel, Gaรฏffas, Stรฉphane
The minimization of convex objectives coming from linear supervised learning problems, such as penalized generalized linear models, can be formulated as finite sums of convex functions. For such problems, a large set of stochastic first-order solvers based on the idea of variance reduction are available and combine both computational efficiency and sound theoretical guarantees (linear convergence rates). Such rates are obtained under both gradient-Lipschitz and strong convexity assumptions. Motivated by learning problems that do not meet the gradient-Lipschitz assumption, such as linear Poisson regression, we work under another smoothness assumption, and obtain a linear convergence rate for a shifted version of Stochastic Dual Coordinate Ascent (SDCA) that improves the current state-of-the-art. Our motivation for considering a solver working on the Fenchel-dual problem comes from the fact that such objectives include many linear constraints, that are easier to deal with in the dual. Our approach and theoretical findings are validated on several datasets, for Poisson regression and another objective coming from the negative log-likelihood of the Hawkes process, which is a family of models which proves extremely useful for the modeling of information propagation in social networks and causality inference.
Learning Implicit Generative Models by Teaching Explicit Ones
Du, Chao, Xu, Kun, Li, Chongxuan, Zhu, Jun, Zhang, Bo
Implicit generative models are difficult to train as no explicit probability density functions are defined. The well-known minimax framework proposed by generative adversarial nets (GANs) is equivalent to minimizing the Jensen-Shannon divergence and suffers from mode collapse in practice. In this paper, we propose learning by teaching (LBT) framework to train implicit generative models via incorporating an auxiliary explicit model. In LBT, an explicit model is introduced to learn the distribution defined by the implicit model and the later one's goal is to teach the explicit model to cover the training data. Formally, our method is formulated as a bilevel optimization problem, whose optimum implies that we obatin the MLE of the implicit model. We also adopt the unrolling trick to make the optimization problem differentiable with respect to the implicit model's parameters. Experimental results demonstrate the effectiveness of our proposed method.
Whale swarm algorithm with the mechanism of identifying and escaping from extreme point for multimodal function optimization
Zeng, Bing, Li, Xinyu, Gao, Liang, Zhang, Yuyan, Dong, Haozhen
Noname manuscript No. (will be inserted by the editor) Abstract Most real-world optimization problems often come with multiple global optima or local optima. Therefore, increasing niching metaheuristic algorithms, which devote to finding multiple optima in a single run, are developed to solve these multimodal optimization problems. However, there are two difficulties urgently to be solved for most existing niching metaheuristic algorithms: how to set the optimal values of niching parameters for different optimization problems, and how to jump out of the local optima efficiently. Based on Whale Swarm Algorithm (WSA) we proposed previously, this paper presents a new multimodal optimizer named WSA with Iterative Counter (WSA-IC) to address these two difficulties. In the one hand, WSA-IC improves the iteration rule of the original WSA for multimodal optimization, which removes the need of specifying different values of attenuation coefficient for different problems to form multiple subpopulations, without introducing any niching parameter. In the other hand, WSA-IC enables the identification of extreme point during iterations relying on two new parameters (i.e., stability threshold T Moreover, the convergence of WSA-IC is proved. Finally, the proposed WSA-IC is compared with several niching metaheuristic algorithms on CEC2015 niching benchmark test functions and five additional classical multimodal functions with high dimensions. The experimental results demonstrate that WSA-IC statistically outperforms other niching metaheuristic algorithms on most test functions. Keywords Whale swarm algorithm ยท multimodal optimization ยท metaheuristic algorithm ยท niching ยท extreme point 1 Introduction Most of the real-world optimization problems are multimodal [1-6], i.e., their objective functions often contain multiple global optima or local optima. In such a scenario, using metaheuristic algorithms, no matter evolutionary algorithms (EAs) or swarm based algorithms, to solve these problems has become a hot research topic, as they are easy to implement and can converge to as good as possible solutions.
Computer Assisted Localization of a Heart Arrhythmia
Vogl, Chris, Zheng, Peng, Seslar, Stephen P., Aravkin, Aleksandr Y.
Abstract--We consider the problem of locating a point-source heart arrhythmia using data from a standard diagnostic procedure, where a reference catheter is placed in the heart, and arrival times from a second diagnostic catheter are recorded as the diagnostic catheter moves around within the heart. We model this situation as a nonconvex feasibility problem, where given a set of arrival times, we look for a source location that is consistent with the available data. We develop a new optimization approach and fast algorithm to obtain online proposals for the next location to suggest to the operator as she collects data. We validate the procedure using a Monte Carlo simulation based on patients' electrophysiological data. The proposed procedure robustly and quickly locates the source of arrhythmias without any prior knowledge of heart anatomy.
A Tutorial on Bayesian Optimization
Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample. In this tutorial, we describe how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient. We then discuss more advanced techniques, including running multiple function evaluations in parallel, multi-fidelity and multi-information source optimization, expensive-to-evaluate constraints, random environmental conditions, multi-task Bayesian optimization, and the inclusion of derivative information. We conclude with a discussion of Bayesian optimization software and future research directions in the field. Within our tutorial material we provide a generalization of expected improvement to noisy evaluations, beyond the noise-free setting where it is more commonly applied. This generalization is justified by a formal decision-theoretic argument, standing in contrast to previous ad hoc modifications.
BALSON: Bayesian Least Squares Optimization with Nonnegative L1-Norm Constraint
Xie, Jiyang, Ma, Zhanyu, Zhang, Guoqiang, Xue, Jing-Hao, Chien, Jen-Tzung, Lin, Zhiqing, Guo, Jun
A Bayesian approach termed BAyesian Least Squares Optimization with Nonnegative L1-norm constraint (BALSON) is proposed. The error distribution of data fitting is described by Gaussian likelihood. The parameter distribution is assumed to be a Dirichlet distribution. With the Bayes rule, searching for the optimal parameters is equivalent to finding the mode of the posterior distribution. In order to explicitly characterize the nonnegative L1-norm constraint of the parameters, we further approximate the true posterior distribution by a Dirichlet distribution. We estimate the statistics of the approximating Dirichlet posterior distribution by sampling methods. Four sampling methods have been introduced. With the estimated posterior distributions, the original parameters can be effectively reconstructed in polynomial fitting problems, and the BALSON framework is found to perform better than conventional methods.