Goto

Collaborating Authors

 Optimization


Learning to Identify Regular Expressions that Describe Email Campaigns

arXiv.org Machine Learning

This paper addresses the problem of inferring a regular expression from a given set of strings that resembles, as closely as possible, the regular expression that a human expert would have written to identify the language. This is motivated by our goal of automating the task of postmasters of an email service who use regular expressions to describe and blacklist email spam campaigns. Training data contains batches of messages and corresponding regular expressions that an expert postmaster feels confident to blacklist. We model this task as a learning problem with structured output spaces and an appropriate loss function, derive a decoder and the resulting optimization problem, and a report on a case study conducted with an email service.


A Complete Analysis of the l_1,p Group-Lasso

arXiv.org Machine Learning

The Group-Lasso is a well-known tool for joint regularization in machine learning methods. While the l_{1,2} and the l_{1,\infty} version have been studied in detail and efficient algorithms exist, there are still open questions regarding other l_{1,p} variants. We characterize conditions for solutions of the l_{1,p} Group-Lasso for all p-norms with 1 <= p <= \infty, and we present a unified active set algorithm. For all p-norms, a highly efficient projected gradient algorithm is presented. This new algorithm enables us to compare the prediction performance of many variants of the Group-Lasso in a multi-task learning setting, where the aim is to solve many learning problems in parallel which are coupled via the Group-Lasso constraint. We conduct large-scale experiments on synthetic data and on two real-world data sets. In accordance with theoretical characterizations of the different norms we observe that the weak-coupling norms with p between 1.5 and 2 consistently outperform the strong-coupling norms with p >> 2.


Information-theoretic Semi-supervised Metric Learning via Entropy Regularization

arXiv.org Machine Learning

We propose a general information-theoretic approach called Seraph (SEmi-supervised metRic leArning Paradigm with Hyper-sparsity) for metric learning that does not rely upon the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize the entropy of that probability on labeled data and minimize it on unlabeled data following entropy regularization, which allows the supervised and unsupervised parts to be integrated in a natural and meaningful way. Furthermore, Seraph is regularized by encouraging a low-rank projection induced from the metric. The optimization of Seraph is solved efficiently and stably by an EM-like scheme with the analytical E-Step and convex M-Step. Experiments demonstrate that Seraph compares favorably with many well-known global and local metric learning methods.


A Hybrid Algorithm for Convex Semidefinite Optimization

arXiv.org Machine Learning

We present a hybrid algorithm for optimizing a convex, smooth function over the cone of positive semidefinite matrices. Our algorithm converges to the global optimal solution and can be used to solve general large-scale semidefinite programs and hence can be readily applied to a variety of machine learning problems. We show experimental results on three machine learning problems (matrix completion, metric learning, and sparse PCA) . Our approach outperforms state-of-the-art algorithms.


A Unified Robust Classification Model

arXiv.org Machine Learning

A wide variety of machine learning algorithms such as support vector machine (SVM), minimax probability machine (MPM), and Fisher discriminant analysis (FDA), exist for binary classification. The purpose of this paper is to provide a unified classification model that includes the above models through a robust optimization approach. This unified model has several benefits. One is that the extensions and improvements intended for SVM become applicable to MPM and FDA, and vice versa. Another benefit is to provide theoretical results to above learning methods at once by dealing with the unified model. We give a statistical interpretation of the unified classification model and propose a non-convex optimization algorithm that can be applied to non-convex variants of existing learning methods.


Dependence Maximizing Temporal Alignment via Squared-Loss Mutual Information

arXiv.org Machine Learning

Temporal alignment of sequences is an important problem with many practical applications such as speech recognition [1, 2], activity recognition [3, 4], temporal segmentation [5], curve matching [6], chromatographic and micro-array data analysis [7], synthesis of human motion [8], and temporal alignment of human motion [9, 10]. Dynamic time warping (DTW) is a classical temporal alignment method that aligns two sequences by minimizing the pairwise squared Euclidean distance [1, 2]. An advantage of DTW is that the minimization can be efficiently carried out by dynamic programming (DP) [11]. However, due to the Euclidean formulation, DTW may not be able to find a good alignment when the characteristics of the two sequences are substantially different (e.g., sequences have different amplitudes). Moreover, DTW cannot handle sequences with different dimensions (e.g., image to audio alignment), which limits the range of applications significantly.


Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies

arXiv.org Machine Learning

Inference problems in graphical models can be represented as a constrained optimization of a free energy function. It is known that when the Bethe free energy is used, the fixedpoints of the belief propagation (BP) algorithm correspond to the local minima of the free energy. However BP fails to converge in many cases of interest. Moreover, the Bethe free energy is non-convex for graphical models with cycles thus introducing great difficulty in deriving efficient algorithms for finding local minima of the free energy for general graphs. In this paper we introduce two efficient BP-like algorithms, one sequential and the other parallel, that are guaranteed to converge to the global minimum, for any graph, over the class of energies known as "convex free energies". In addition, we propose an efficient heuristic for setting the parameters of the convex free energy based on the structure of the graph.


Greedy Block Coordinate Descent for Large Scale Gaussian Process Regression

arXiv.org Machine Learning

We propose a variable decomposition algorithm -greedy block coordinate descent (GBCD)- in order to make dense Gaussian process regression practical for large scale problems. GBCD breaks a large scale optimization into a series of small sub-problems. The challenge in variable decomposition algorithms is the identification of a subproblem (the active set of variables) that yields the largest improvement. We analyze the limitations of existing methods and cast the active set selection into a zero-norm constrained optimization problem that we solve using greedy methods. By directly estimating the decrease in the objective function, we obtain not only efficient approximate solutions for GBCD, but we are also able to demonstrate that the method is globally convergent. Empirical comparisons against competing dense methods like Conjugate Gradient or SMO show that GBCD is an order of magnitude faster. Comparisons against sparse GP methods show that GBCD is both accurate and capable of handling datasets of 100,000 samples or more.


On Modeling the Tactical Planning of Oil Pipeline Networks

AAAI Conferences

This paper aims at incorporating tactical aspects of oil pipeline networks to the supply chain planning model. The strategic design of supply chains is covered in literature by well understood and recurring patterns such as multi-commodity networks, dynamic parameters over time, capacity on facilities, transportation capacity or facilities with demand, production and inventory. We consider the following characteristics: capacity for in-transit inventory, transit time and flow reversal. Our objective is a better estimate for resources required by the network and therewith allow a more precise optimization of their use. All aspects are modeled to be efficiently solved by linear programming algorithms.


Schedule-Driven Coordination for Real-Time Traffic Network Control

AAAI Conferences

Real-time optimization of the dynamic flow of vehicle traffic through a network of signalized intersections is an important practical problem. In this paper, we take a decentralized, schedule-driven coordination approach to address the challenge of achieving scalable network-wide optimization. To be locally effective, each intersection is controlled independently by an on-line scheduling agent. At each decision point, an agent constructs a schedule that optimizes movement of the observable traffic through the intersection, and uses this schedule to determine the best control action to take over the current look-ahead horizon. Decentralized coordination mechanisms, limited to interaction among direct neighbors to ensure scalability, are then layered on top of these asynchronously operating scheduling agents to promote overall performance. As a basic protocol, each agent queries for newly planned output flows from its upstream neighbors to obtain an optimistic projection of future demand. This projection may incorporate non-local influence from indirect neighbors depending on horizon length. Two additional mechanisms are then introduced to dampen ``nervousness'' and dynamic instability in the network, by adjusting locally determined schedules to better align with those of neighbors. We present simulation results on two traffic networks of tightly-coupled intersections that demonstrate the ability of our approach to establish traffic flows with lower average vehicle wait times than both a simple isolated control strategy and other contemporary coordinated control strategies that use moving average forecast or traditional offset calculation.