Goto

Collaborating Authors

 Optimization


Robust Ordinal Embedding from Contaminated Relative Comparisons

arXiv.org Machine Learning

Existing ordinal embedding methods usually follow a two-stage routine: outlier detection is first employed to pick out the inconsistent comparisons; then an embedding is learned from the clean data. However, learning in a multi-stage manner is well-known to suffer from sub-optimal solutions. In this paper, we propose a unified framework to jointly identify the contaminated comparisons and derive reliable embeddings. The merits of our method are three-fold: (1) By virtue of the proposed unified framework, the sub-optimality of traditional methods is largely alleviated; (2) The proposed method is aware of global inconsistency by minimizing a corresponding cost, while traditional methods only involve local inconsistency; (3) Instead of considering the nuclear norm heuristics, we adopt an exact solution for rank equality constraint. Our studies are supported by experiments with both simulated examples and real-world data. The proposed framework provides us a promising tool for robust ordinal embedding from the contaminated comparisons.


Less but Better: Generalization Enhancement of Ordinal Embedding via Distributional Margin

arXiv.org Machine Learning

In the absence of prior knowledge, ordinal embedding methods obtain new representation for items in a low-dimensional Euclidean space via a set of quadruple-wise comparisons. These ordinal comparisons often come from human annotators, and sufficient comparisons induce the success of classical approaches. However, collecting a large number of labeled data is known as a hard task, and most of the existing work pay little attention to the generalization ability with insufficient samples. Meanwhile, recent progress in large margin theory discloses that rather than just maximizing the minimum margin, both the margin mean and variance, which characterize the margin distribution, are more crucial to the overall generalization performance. To address the issue of insufficient training samples, we propose a margin distribution learning paradigm for ordinal embedding, entitled Distributional Margin based Ordinal Embedding (\textit{DMOE}). Precisely, we first define the margin for ordinal embedding problem. Secondly, we formulate a concise objective function which avoids maximizing margin mean and minimizing margin variance directly but exhibits the similar effect. Moreover, an Augmented Lagrange Multiplier based algorithm is customized to seek the optimal solution of \textit{DMOE} effectively. Experimental studies on both simulated and real-world datasets are provided to show the effectiveness of the proposed algorithm.


On Uncensored Mean First-Passage-Time Performance Experiments with Multiwalk in $\mathbb{R}^p$: a New Stochastic Optimization Algorithm

arXiv.org Artificial Intelligence

A rigorous empirical comparison of two stochastic solvers is important when one of the solvers is a prototype of a new algorithm such as multiwalk (MWA). When searching for global minima in $\mathbb{R}^p$, the key data structures of MWA include: $p$ rulers with each ruler assigned $m$ marks and a set of $p$ neighborhood matrices of size up to $m(m-2)$, where each entry represents absolute values of pairwise differences between $m$ marks. Before taking the next step, a controller links the tableau of neighborhood matrices and computes new and improved positions for each of the $m$ marks. The number of columns in each neighborhood matrix is denoted as the neighborhood radius $r_n \le m-2$. Any variant of the DEA (differential evolution algorithm) has an effective population neighborhood of radius not larger than 1. Uncensored first-passage-time performance experiments that vary the neighborhood radius of a MW-solver can thus be readily compared to existing variants of DE-solvers. The paper considers seven test cases of increasing complexity and demonstrates, under uncensored first-passage-time performance experiments: (1) significant variability in convergence rate for seven DE-based solver configurations, and (2) consistent, monotonic, and significantly faster rate of convergence for the MW-solver prototype as we increase the neighborhood radius from 4 to its maximum value.


Consistency for 0-1 Programming

arXiv.org Artificial Intelligence

Concepts of consistency have long played a key role in constraint programming but never developed in integer programming (IP). Consistency nonetheless plays a role in IP as well. For example, cutting planes can reduce backtracking by achieving various forms of consistency as well as by tightening the linear programming (LP) relaxation. We introduce a type of consistency that is particularly suited for 0-1 programming and develop the associated theory. We define a 0-1 constraint set as LP-consistent when any partial assignment that is consistent with its linear programming relaxation is consistent with the original 0-1 constraint set. We prove basic properties of LP-consistency, including its relationship with Chvatal-Gomory cuts and the integer hull. We show that a weak form of LP-consistency can reduce or eliminate backtracking in a way analogous to k-consistency but is easier to achieve. In so doing, we identify a class of valid inequalities that can be more effective than traditional cutting planes at cutting off infeasible 0-1 partial assignments.


Deep Inverse Optimization

arXiv.org Machine Learning

Given a set of observations generated by an optimization process, the goal of inverse optimization is to determine likely parameters of that process. We cast inverse optimization as a form of deep learning. Our method, called deep inverse optimization, is to unroll an iterative optimization process and then use backpropagation to learn parameters that generate the observations. We demonstrate that by backpropagating through the interior point algorithm we can learn the coefficients determining the cost vector and the constraints, independently or jointly, for both non-parametric and parametric linear programs, starting from one or multiple observations. With this approach, inverse optimization can leverage concepts and algorithms from deep learning.


On learning with shift-invariant structures

arXiv.org Machine Learning

Abstract--We describe new results and algorithms for two different, but related, problems which deal with circulant matrices: learningshift-invariant components from training data and calculating the shift (or alignment) between two given signals. In the first instance, we deal with the shift-invariant dictionary learning problem while the latter bears the name of (compressive) shift retrieval. We formulate these problems using circulant and convolutional matrices (including unions of such matrices), define optimization problems that describe our goals and propose efficient ways to solve them. Based on these findings, we also show how to learn a wavelet-like dictionary from training data. We connect our work with various previous results from the literature and we show the effectiveness of our proposed algorithms using synthetic, ECG signals and images.


Feature selection with optimal coordinate ascent (OCA)

arXiv.org Machine Learning

In machine learning, Feature Selection (FS) is a major part of efficient algorithm. It fuels the algorithm and is the starting block for our prediction. In this paper, we present a new method, called Optimal Coordinate Ascent (OCA) that allows us selecting features among block and individual features. OCA relies on coordinate ascent to find an optimal solution for gradient boosting methods score (number of correctly classified samples). OCA takes into account the notion of dependencies between variables forming blocks in our optimization. The coordinate ascent optimization solves the issue of the NP hard original problem where the number of combinations rapidly explode making a grid search unfeasible. It reduces considerably the number of iterations changing this NP hard problem into a polynomial search one. OCA brings substantial differences and improvements compared to previous coordinate ascent feature selection method: we group variables into block and individual variables instead of a binary selection. Our initial guess is based on the k-best group variables making our initial point more robust. We also introduced new stopping criteria making our optimization faster. We compare these two methods on our data set. We found that our method outperforms the initial one. We also compare our method to the Recursive Feature Elimination (RFE) method and find that OCA leads to the minimum feature set with the highest score. This is a nice byproduct of our method as it provides empirically the most compact data set with optimal performance.


Automatic hyperparameter selection in Autodock

arXiv.org Machine Learning

Autodock is a widely used molecular modeling tool which predicts how small molecules bind to a receptor of known 3D structure. The current version of AutoDock uses meta-heuristic algorithms in combination with local search methods for doing the conformation search. Appropriate settings of hyperparameters in these algorithms are important, particularly for novice users who often find it hard to identify the best configuration. In this work, we design a surrogate based multi-objective algorithm to help such users by automatically tuning hyperparameter settings. The proposed method iteratively uses a radial basis function model and non-dominated sorting to evaluate the sampled configurations during the search phase. Our experimental results using Autodock show that the introduced component is practical and effective.


Imputation of Clinical Covariates in Time Series

arXiv.org Machine Learning

Missing data is a common problem in real-world settings and particularly relevant in healthcare applications where researchers use Electronic Health Records (EHR) and results of observational studies to apply analytics methods. This issue becomes even more prominent for longitudinal data sets, where multiple instances of the same individual correspond to different observations in time. Standard imputation methods do not take into account patient specific information incorporated in multivariate panel data. We introduce the novel imputation algorithm MedImpute that addresses this problem, extending the flexible framework of OptImpute suggested by Bertsimas et al. (2018). Our algorithm provides imputations for data sets with missing continuous and categorical features, and we present the formulation and implement scalable first-order methods for a $K$-NN model. We test the performance of our algorithm on longitudinal data from the Framingham Heart Study when data are missing completely at random (MCAR). We demonstrate that MedImpute leads to significant improvements in both imputation accuracy and downstream model AUC compared to state-of-the-art methods.


Modulated Policy Hierarchies

arXiv.org Artificial Intelligence

Solving tasks with sparse rewards is a main challenge in reinforcement learning. While hierarchical controllers are an intuitive approach to this problem, current methods often require manual reward shaping, alternating training phases, or manually defined sub tasks. We introduce modulated policy hierarchies (MPH), that can learn end-to-end to solve tasks from sparse rewards. To achieve this, we study different modulation signals and exploration for hierarchical controllers. Specifically, we find that communicating via bit-vectors is more efficient than selecting one out of multiple skills, as it enables mixing between them. To facilitate exploration, MPH uses its different time scales for temporally extended intrinsic motivation at each level of the hierarchy. We evaluate MPH on the robotics tasks of pushing and sparse block stacking, where it outperforms recent baselines.