Optimization
The constrained Dantzig selector with enhanced consistency
Kong, Yinfei, Zheng, Zemin, Lv, Jinchi
The Dantzig selector has received popularity for many applications such as compressed sensing and sparse modeling, thanks to its computational efficiency as a linear programming problem and its nice sampling properties. Existing results show that it can recover sparse signals mimicking the accuracy of the ideal procedure, up to a logarithmic factor of the dimensionality. Such a factor has been shown to hold for many regularization methods. An important question is whether this factor can be reduced to a logarithmic factor of the sample size in ultra-high dimensions under mild regularity conditions. To provide an affirmative answer, in this paper we suggest the constrained Dantzig selector, which has more flexible constraints and parameter space. We prove that the suggested method can achieve convergence rates within a logarithmic factor of the sample size of the oracle rates and improved sparsity, under a fairly weak assumption on the signal strength. Such improvement is significant in ultra-high dimensions. This method can be implemented efficiently through sequential linear programming. Numerical studies confirm that the sample size needed for a certain level of accuracy in these problems can be much reduced.
Minimizing Finite Sums with the Stochastic Average Gradient
Schmidt, Mark, Roux, Nicolas Le, Bach, Francis
We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for p \textless{} 1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.
Kernel-Based Structural Equation Models for Topology Identification of Directed Networks
Shen, Yanning, Baingana, Brian, Giannakis, Georgios B.
Structural equation models (SEMs) have been widely adopted for inference of causal interactions in complex networks. Recent examples include unveiling topologies of hidden causal networks over which processes such as spreading diseases, or rumors propagate. The appeal of SEMs in these settings stems from their simplicity and tractability, since they typically assume linear dependencies among observable variables. Acknowledging the limitations inherent to adopting linear models, the present paper advocates nonlinear SEMs, which account for (possible) nonlinear dependencies among network nodes. The advocated approach leverages kernels as a powerful encompassing framework for nonlinear modeling, and an efficient estimator with affordable tradeoffs is put forth. Interestingly, pursuit of the novel kernel-based approach yields a convex regularized estimator that promotes edge sparsity, and is amenable to proximal-splitting optimization methods. To this end, solvers with complementary merits are developed by leveraging the alternating direction method of multipliers, and proximal gradient iterations. Experiments conducted on simulated data demonstrate that the novel approach outperforms linear SEMs with respect to edge detection errors. Furthermore, tests on a real gene expression dataset unveil interesting new edges that were not revealed by linear SEMs, which could shed more light on regulatory behavior of human genes.
Bayesian Optimization for Hyperparameter Tuning - Arimo
Bayesian Optimization helped us find a hyperparameter configuration that is better than the one found by Random Search for a neural network on the San Francisco Crimes dataset. People who are familiar with Machine Learning might want to fast forward to Section 3 for details. The code to reproduce the experiments can be found here. Hyperparameter tuning may be one of the most tricky, yet interesting, topics in Machine Learning. For most Machine Learning practitioners, mastering the art of tuning hyperparameters requires not only a solid background in Machine Learning algorithms, but also extensive experience working with real-world datasets.
Oracle Based Active Set Algorithm for Scalable Elastic Net Subspace Clustering
You, Chong, Li, Chun-Guang, Robinson, Daniel P., Vidal, Rene
State-of-the-art subspace clustering methods are based on expressing each data point as a linear combination of other data points while regularizing the matrix of coefficients with $\ell_1$, $\ell_2$ or nuclear norms. $\ell_1$ regularization is guaranteed to give a subspace-preserving affinity (i.e., there are no connections between points from different subspaces) under broad theoretical conditions, but the clusters may not be connected. $\ell_2$ and nuclear norm regularization often improve connectivity, but give a subspace-preserving affinity only for independent subspaces. Mixed $\ell_1$, $\ell_2$ and nuclear norm regularizations offer a balance between the subspace-preserving and connectedness properties, but this comes at the cost of increased computational complexity. This paper studies the geometry of the elastic net regularizer (a mixture of the $\ell_1$ and $\ell_2$ norms) and uses it to derive a provably correct and scalable active set method for finding the optimal coefficients. Our geometric analysis also provides a theoretical justification and a geometric interpretation for the balance between the connectedness (due to $\ell_2$ regularization) and subspace-preserving (due to $\ell_1$ regularization) properties for elastic net subspace clustering. Our experiments show that the proposed active set method not only achieves state-of-the-art clustering performance, but also efficiently handles large-scale datasets.
A Bayesian approach to constrained single- and multi-objective optimization
Feliot, Paul, Bect, Julien, Vazquez, Emmanuel
This article addresses the problem of derivative-free (single- or multi-objective) optimization subject to multiple inequality constraints. Both the objective and constraint functions are assumed to be smooth, non-linear and expensive to evaluate. As a consequence, the number of evaluations that can be used to carry out the optimization is very limited, as in complex industrial design optimization problems. The method we propose to overcome this difficulty has its roots in both the Bayesian and the multi-objective optimization literatures. More specifically, an extended domination rule is used to handle objectives and constraints in a unified way, and a corresponding expected hyper-volume improvement sampling criterion is proposed. This new criterion is naturally adapted to the search of a feasible point when none is available, and reduces to existing Bayesian sampling criteria---the classical Expected Improvement (EI) criterion and some of its constrained/multi-objective extensions---as soon as at least one feasible point is available. The calculation and optimization of the criterion are performed using Sequential Monte Carlo techniques. In particular, an algorithm similar to the subset simulation method, which is well known in the field of structural reliability, is used to estimate the criterion. The method, which we call BMOO (for Bayesian Multi-Objective Optimization), is compared to state-of-the-art algorithms for single- and multi-objective constrained optimization.
Assessing Supply Chain Robustness Through Stress Testing
Shekh, Slava (Defence Science and Technology Group) | Marsh, Luke (Defence Science and Technology Group)
Military logistics planning is a complex and time consuming process, and ultimately many factors can impact the devised plan. We have developed a way of assessing the robustness of a logistics plan and associated supply chain by considering the occurrence of disruptive negative events. These events may occur in isolation or simultaneously. Each event is associated with a plan variable and the impact of a negative event can then be assessed by measuring whether the modification of its associated variable causes any stockouts in the plan during simulation. By modifying any two variables simultaneously, we can identify the minimum values of those two variables, which in combination cause a stockout. These minimum values are called breakpoints, and a set of breakpoints is referred to as a breakpoint front. We have investigated various approaches of calculating or estimating breakpoint fronts, including brute force search, the Monte Carlo method, 2D binary search and the multi-objective optimization methods, SPEA2 and NSGA-II. Our experiments showed that 2D binary search performed best overall, due to its low computation time and accuracy in calculating a range of breakpoint fronts.
IISCNLP at SemEval-2016 Task 2: Interpretable STS with ILP based Multiple Chunk Aligner
Tekumalla, Lavanya Sita, Sharmistha, null
Interpretable semantic textual similarity (iSTS) task adds a crucial explanatory layer to pairwise sentence similarity. We address various components of this task: chunk level semantic alignment along with assignment of similarity type and score for aligned chunks with a novel system presented in this paper. We propose an algorithm, iMATCH, for the alignment of multiple non-contiguous chunks based on Integer Linear Programming (ILP). Similarity type and score assignment for pairs of chunks is done using a supervised multiclass classification technique based on Random Forrest Classifier. Results show that our algorithm iMATCH has low execution time and outperforms most other participating systems in terms of alignment score. Of the three datasets, we are top ranked for answer- students dataset in terms of overall score and have top alignment score for headlines dataset in the gold chunks track.
10 Enterprise Predictive Analytics Platforms Compared
FICO provides a broad range of technologies and services to support business optimisation and the embedding of intelligence into applications of various kinds. Under the hood there is a lot going on – from linear (and non-linear) programming through to predictive analytics and other extremely powerful methods of supporting business decisions through applying intelligence to a variety of applications. It is clear that the experience and know-how of FICO in the industries it serves is probably unique. Financial services, retail, government and healthcare are its main markets, but the technologies and methods they employ have broad applicability.
A hybrid swarm-based algorithm for single-objective optimization problems involving high-cost analyses
Ampellio, Enrico, Vassio, Luca
In many technical fields, single-objective optimization procedures in continuous domains involve expensive numerical simulations. In this context, an improvement of the Artificial Bee Colony (ABC) algorithm, called the Artificial super-Bee enhanced Colony (AsBeC), is presented. AsBeC is designed to provide fast convergence speed, high solution accuracy and robust performance over a wide range of problems. It implements enhancements of the ABC structure and hybridizations with interpolation strategies. The latter are inspired by the quadratic trust region approach for local investigation and by an efficient global optimizer for separable problems. Each modification and their combined effects are studied with appropriate metrics on a numerical benchmark, which is also used for comparing AsBeC with some effective ABC variants and other derivative-free algorithms. In addition, the presented algorithm is validated on two recent benchmarks adopted for competitions in international conferences. Results show remarkable competitiveness and robustness for AsBeC.