Optimization
A Particle Filter based Multi-Objective Optimization Algorithm: PFOPS
This letter is concerned with a recently developed paradigm of population-based optimization, termed particle filter optimization (PFO). In contrast with the commonly used meta-heuristics based methods, the PFO paradigm is attractive in terms of coherence in theory and easiness in mathematical analysis and interpretation. However, current PFO algorithms only work for single-objective optimization cases, while many real-life problems involve multiple objectives to be optimized simultaneously. To this end, we make an effort to extend the scope of application of the PFO paradigm to multi-objective optimization (MOO) cases. An idea called path sampling is adopted within the PFO scheme to balance the different objectives to be optimized. The resulting algorithm is thus termed PFO with Path Sampling (PFOPS). Experimental results show that the proposed algorithm works consistently well for three different types of MOO problems, which are characterized by an associated convex, concave and discontinuous Pareto front, respectively.
A Limitation of V-Matrix based Methods
Gauraha, Niharika, Chaturvedi, Akshay
To estimate the conditional probability functions based on the direct problem setting, V-matrix based method was proposed. We construct V-matrix based constrained quadratic programming problems for which the inequality constraints are inconsistent. In particular, we would like to present that the constrained quadratic optimization problem for conditional probability estimation using V-matrix method may not have a consistent solution always. A limitation of V-matrix based (Direct) Method of Solving Conditional Probability Function V-Matrix method of estimation of conditional probability function was defined in [1]. Here we present a limitationof thesame. Since the kernel matrices are known to be positive semi-definite, the constrains matrices for the above Quadratic Programming (QP) problem may not have full rank and there is no guarantee that the constraints will be consistent, hence no solution could be found.
Iterative multi-path tracking for video and volume segmentation with sparse point supervision
Lejeune, Laurent, Grossrieder, Jan, Sznitman, Raphael
Recent machine learning strategies for segmentation tasks have shown great ability when trained on large pixel-wise annotated image datasets. It remains a major challenge however to aggregate such datasets, as the time and monetary cost associated with collecting extensive annotations is extremely high. This is particularly the case for generating precise pixel-wise annotations in video and volumetric image data. To this end, this work presents a novel framework to produce pixel-wise segmentations using minimal supervision. Our method relies on 2D point supervision, whereby a single 2D location within an object of interest is provided on each image of the data. Our method then estimates the object appearance in a semi-supervised fashion by learning object-image-specific features and by using these in a semi-supervised learning framework. Our object model is then used in a graph-based optimization problem that takes into account all provided locations and the image data in order to infer the complete pixel-wise segmentation. In practice, we solve this optimally as a tracking problem using a K-shortest path approach. Both the object model and segmentation are then refined iteratively to further improve the final segmentation. We show that by collecting 2D locations using a gaze tracker, our approach can provide state-of-the-art segmentations on a range of objects and image modalities (video and 3D volumes), and that these can then be used to train supervised machine learning classifiers.
Self-Paced Multi-Task Clustering
Ren, Yazhou, Que, Xiaofan, Yao, Dezhong, Xu, Zenglin
Multi-task clustering (MTC) has attracted a lot of research attentions in machine learning due to its ability in utilizing the relationship among different tasks. Despite the success of traditional MTC models, they are either easy to stuck into local optima, or sensitive to outliers and noisy data. To alleviate these problems, we propose a novel self-paced multi-task clustering (SPMTC) paradigm. In detail, SPMTC progressively selects data examples to train a series of MTC models with increasing complexity, thus highly decreases the risk of trapping into poor local optima. Furthermore, to reduce the negative influence of outliers and noisy data, we design a soft version of SPMTC to further improve the clustering performance. The corresponding SPMTC framework can be easily solved by an alternating optimization method. The proposed model is guaranteed to converge and experiments on real data sets have demonstrated its promising results compared with state-of-the-art multi-task clustering methods.
Adaptive Grey-Box Fuzz-Testing with Thompson Sampling
Karamcheti, Siddharth, Mann, Gideon, Rosenberg, David
Fuzz testing, or "fuzzing," refers to a widely deployed class of techniques for testing programs by generating a set of inputs for the express purpose of finding bugs and identifying security flaws. Grey-box fuzzing, the most popular fuzzing strategy, combines light program instrumentation with a data driven process to generate new program inputs. In this work, we present a machine learning approach that builds on AFL, the preeminent grey-box fuzzer, by adaptively learning a probability distribution over its mutation operators on a program-specific basis. These operators, which are selected uniformly at random in AFL and mutational fuzzers in general, dictate how new inputs are generated, a core part of the fuzzer's efficacy. Our main contributions are two-fold: First, we show that a sampling distribution over mutation operators estimated from training programs can significantly improve performance of AFL. Second, we introduce a Thompson Sampling, bandit-based optimization approach that fine-tunes the mutator distribution adaptively, during the course of fuzzing an individual program. A set of experiments across complex programs demonstrates that tuning the mutational operator distribution generates sets of inputs that yield significantly higher code coverage and finds more crashes faster and more reliably than both baseline versions of AFL as well as other AFL-based learning approaches.
A Jointly Learned Context-Aware Place of Interest Embedding for Trip Recommendations
He, Jiayuan, Qi, Jianzhong, Ramamohanarao, Kotagiri
Trip recommendation is an important location-based service that helps relieve users from the time and efforts for trip planning. It aims to recommend a sequence of places of interest (POIs) for a user to visit that maximizes the user's satisfaction. When adding a POI to a recommended trip, it is essential to understand the context of the recommendation, including the POI popularity, other POIs co-occurring in the trip, and the preferences of the user. These contextual factors are learned separately in existing studies, while in reality, they impact jointly on a user's choice of a POI to visit. In this study, we propose a POI embedding model to jointly learn the impact of these contextual factors. We call the learned POI embedding a context-aware POI embedding. To showcase the effectiveness of this embedding, we apply it to generate trip recommendations given a user and a time budget. We propose two trip recommendation algorithms based on our context-aware POI embedding. The first algorithm finds the exact optimal trip by transforming and solving the trip recommendation problem as an integer linear programming problem. To achieve a high computation efficiency, the second algorithm finds a heuristically optimal trip based on adaptive large neighborhood search. We perform extensive experiments on real datasets. The results show that our proposed algorithms consistently outperform state-of-the-art algorithms in trip recommendation quality, with an advantage of up to 43% in F1-score.
Efficient sparse Hessian based algorithms for the clustered lasso problem
Lin, Meixia, Liu, Yong-Jin, Sun, Defeng, Toh, Kim-Chuan
We focus on solving the clustered lasso problem, which is a least squares problem with the $\ell_1$-type penalties imposed on both the coefficients and their pairwise differences to learn the group structure of the regression parameters. Here we first reformulate the clustered lasso regularizer as a weighted ordered-lasso regularizer, which is essential in reducing the computational cost from $O(n^2)$ to $O(n\log (n))$. We then propose an inexact semismooth Newton augmented Lagrangian (SSNAL) algorithm to solve the clustered lasso problem or its dual via this equivalent formulation, depending on whether the sample size is larger than the dimension of the features. An essential component of the SSNAL algorithm is the computation of the generalized Jacobian of the proximal mapping of the clustered lasso regularizer. Based on the new formulation, we derive an efficient procedure for its computation. Comprehensive results on the global convergence and local linear convergence of the SSNAL algorithm are established. For the purpose of exposition and comparison, we also summarize/design several first-order methods that can be used to solve the problem under consideration, but with the key improvement from the new formulation of the clustered lasso regularizer. As a demonstration of the applicability of our algorithms, numerical experiments on the clustered lasso problem are performed. The experiments show that the SSNAL algorithm substantially outperforms the best alternative algorithm for the clustered lasso problem.
Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms
State-of-the-art distributed machine learning suffers from significant delays due to frequent communication and synchronizing between worker nodes. Emerging communication-efficient SGD algorithms that limit synchronization between locally trained models have been shown to be effective in speeding-up distributed SGD. However, a rigorous convergence analysis and comparative study of different communication-reduction strategies remains a largely open problem. This paper presents a new framework called Coooperative SGD that subsumes existing communication-efficient SGD algorithms such as federated-averaging, elastic-averaging and decentralized SGD. By analyzing Cooperative SGD, we provide novel convergence guarantees for existing algorithms. Moreover this framework enables us to design new communication-efficient SGD algorithms that strike the best balance between reducing communication overhead and achieving fast error convergence.
Pathologies in information bottleneck for deterministic supervised learning
Kolchinsky, Artemy, Tracey, Brendan D., Van Kuyk, Steven
Information bottleneck (IB) is a method for extracting information from one random variable X that is relevant for predicting another random variable Y . To do so, IB identifies an intermediate "bottleneck" variable T that has low mutual information I(X; T) and high mutual information I(Y; T). The IB curve characterizes the set of bottleneck variables that achieve maximal I(Y; T) for a given I(X; T), and is typically explored by optimizing the IB Lagrangian, I(Y; T) βI(X; T). Recently, there has been interest in applying IB to supervised learning, particularly for classification problems that use neural networks. In most classification problems, the output class Y is a deterministic function of the input X, which we refer to as "deterministic supervised learning". We demonstrate three pathologies that arise when IB is used in any scenario where Y is a deterministic function of X: (1) the IB curve cannot be recovered by optimizing the IB Lagrangian for different values of β; (2) there are "uninteresting" solutions at all points of the IB curve; and (3) for classifiers that achieve low error rates, the activity of different hidden layers will not exhibit a strict tradeoff between compression and prediction, contrary to a recent proposal. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We finish by demonstrating these issues on the MNIST dataset. The information bottleneck (IB) method (Tishby et al., 1999) provides a principled way to extract information that is present in one variable that is relevant for predicting another variable. Given two random variables X and Y, IB posits a "bottleneck" variable T that obeys the Markov condition Y X T . By the data processing inequality (DPI) (Cover & Thomas, 2012), this Markov condition implies that I(X; T) I(Y; T), meaning the bottleneck variable cannot contain more information about Y than it does about X.
On a New Improvement-Based Acquisition Function for Bayesian Optimization
Bayesian optimization (BO) is a popular algorithm for solving challenging optimization tasks. It is designed for problems where the objective function is expensive to evaluate, perhaps not available in exact form, without gradient information and possibly returning noisy values. Different versions of the algorithm vary in the choice of the acquisition function, which recommends the point to query the objective at next. Initially, researchers focused on improvement-based acquisitions, while recently the attention has shifted to more computationally expensive information-theoretical measures. In this paper we present two major contributions to the literature. First, we propose a new improvement-based acquisition function that recommends query points where the improvement is expected to be high with high confidence. The proposed algorithm is evaluated on a large set of benchmark functions from the global optimization literature, where it turns out to perform at least as well as current state-of-the-art acquisition functions, and often better. This suggests that it is a powerful default choice for BO. The novel policy is then compared to widely used global optimization solvers in order to confirm that BO methods reduce the computational costs of the optimization by keeping the number of function evaluations small. The second main contribution represents an application to precision medicine, where the interest lies in the estimation of parameters of a partial differential equations model of the human pulmonary blood circulation system. Once inferred, these parameters can help clinicians in diagnosing a patient with pulmonary hypertension without going through the standard invasive procedure of right heart catheterization, which can lead to side effects and complications (e.g. severe pain, internal bleeding, thrombosis).