Plotting

 Country


A Unified Robust Classification Model

arXiv.org Machine Learning

A wide variety of machine learning algorithms such as support vector machine (SVM), minimax probability machine (MPM), and Fisher discriminant analysis (FDA), exist for binary classification. The purpose of this paper is to provide a unified classification model that includes the above models through a robust optimization approach. This unified model has several benefits. One is that the extensions and improvements intended for SVM become applicable to MPM and FDA, and vice versa. Another benefit is to provide theoretical results to above learning methods at once by dealing with the unified model. We give a statistical interpretation of the unified classification model and propose a non-convex optimization algorithm that can be applied to non-convex variants of existing learning methods.


Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching

arXiv.org Machine Learning

In real-world classification problems, the class balance in the training dataset does not necessarily reflect that of the test dataset, which can cause significant estimation bias. If the class ratio of the test dataset is known, instance re-weighting or resampling allows systematical bias correction. However, learning the class ratio of the test dataset is challenging when no labeled data is available from the test domain. In this paper, we propose to estimate the class ratio in the test dataset by matching probability distributions of training and test input data. We demonstrate the utility of the proposed approach through experiments.


Learning to Identify Regular Expressions that Describe Email Campaigns

arXiv.org Machine Learning

This paper addresses the problem of inferring a regular expression from a given set of strings that resembles, as closely as possible, the regular expression that a human expert would have written to identify the language. This is motivated by our goal of automating the task of postmasters of an email service who use regular expressions to describe and blacklist email spam campaigns. Training data contains batches of messages and corresponding regular expressions that an expert postmaster feels confident to blacklist. We model this task as a learning problem with structured output spaces and an appropriate loss function, derive a decoder and the resulting optimization problem, and a report on a case study conducted with an email service.


Convex Multitask Learning with Flexible Task Clusters

arXiv.org Machine Learning

Traditionally, multitask learning (MTL) assumes that all the tasks are related. This can lead to negative transfer when tasks are indeed incoherent. Recently, a number of approaches have been proposed that alleviate this problem by discovering the underlying task clusters or relationships. However, they are limited to modeling these relationships at the task level, which may be restrictive in some applications. In this paper, we propose a novel MTL formulation that captures task relationships at the feature-level. Depending on the interactions among tasks and features, the proposed method construct different task clusters for different features, without even the need of pre-specifying the number of clusters. Computationally, the proposed formulation is strongly convex, and can be efficiently solved by accelerated proximal methods. Experiments are performed on a number of synthetic and real-world data sets. Under various degrees of task relationships, the accuracy of the proposed method is consistently among the best. Moreover, the feature-specific task clusters obtained agree with the known/plausible task structures of the data.


Efficient Active Algorithms for Hierarchical Clustering

arXiv.org Machine Learning

Advances in sensing technologies and the growth of the internet have resulted in an explosion in the size of modern datasets, while storage and processing power continue to lag behind. This motivates the need for algorithms that are efficient, both in terms of the number of measurements needed and running time. To combat the challenges associated with large datasets, we propose a general framework for active hierarchical clustering that repeatedly runs an off-the-shelf clustering algorithm on small subsets of the data and comes with guarantees on performance, measurement complexity and runtime complexity. We instantiate this framework with a simple spectral clustering algorithm and provide concrete results on its performance, showing that, under some assumptions, this algorithm recovers all clusters of size ?(log n) using O(n log^2 n) similarities and runs in O(n log^3 n) time for a dataset of n objects. Through extensive experimentation we also demonstrate that this framework is practically alluring.


Fast Computation of Subpath Kernel for Trees

arXiv.org Machine Learning

The kernel method is a potential approach to analyzing structured data such as sequences, trees, and graphs; however, unordered trees have not been investigated extensively. Kimura et al. (2011) proposed a kernel function for unordered trees on the basis of their subpaths, which are vertical substructures of trees responsible for hierarchical information in them. Their kernel exhibits practically good performance in terms of accuracy and speed; however, linear-time computation is not guaranteed theoretically, unlike the case of the other unordered tree kernel proposed by Vishwanathan and Smola (2003). In this paper, we propose a theoretically guaranteed linear-time kernel computation algorithm that is practically fast, and we present an efficient prediction algorithm whose running time depends only on the size of the input tree. Experimental results show that the proposed algorithms are quite efficient in practice.


A Convex Feature Learning Formulation for Latent Task Structure Discovery

arXiv.org Machine Learning

This paper considers the multi-task learning problem and in the setting where some relevant features could be shared across few related tasks. Most of the existing methods assume the extent to which the given tasks are related or share a common feature space to be known apriori. In real-world applications however, it is desirable to automatically discover the groups of related tasks that share a feature space. In this paper we aim at searching the exponentially large space of all possible groups of tasks that may share a feature space. The main contribution is a convex formulation that employs a graph-based regularizer and simultaneously discovers few groups of related tasks, having close-by task parameters, as well as the feature space shared within each group. The regularizer encodes an important structure among the groups of tasks leading to an efficient algorithm for solving it: if there is no feature space under which a group of tasks has close-by task parameters, then there does not exist such a feature space for any of its supersets. An efficient active set algorithm that exploits this simplification and performs a clever search in the exponentially large space is presented. The algorithm is guaranteed to solve the proposed formulation (within some precision) in a time polynomial in the number of groups of related tasks discovered. Empirical results on benchmark datasets show that the proposed formulation achieves good generalization and outperforms state-of-the-art multi-task learning algorithms in some cases.


Linear Regression with Limited Observation

arXiv.org Machine Learning

We consider the most common variants of linear regression, including Ridge, Lasso and Support-vector regression, in a setting where the learner is allowed to observe only a fixed number of attributes of each example at training time. We present simple and efficient algorithms for these problems: for Lasso and Ridge regression they need the same total number of attributes (up to constants) as do full-information algorithms, for reaching a certain accuracy. For Support-vector regression, we require exponentially less attributes compared to the state of the art. By that, we resolve an open problem recently posed by Cesa-Bianchi et al. (2010). Experiments show the theoretical bounds to be justified by superior performance compared to the state of the art.


Partial-Hessian Strategies for Fast Learning of Nonlinear Embeddings

arXiv.org Machine Learning

Stochastic neighbor embedding (SNE) and related nonlinear manifold learning algorithms achieve high-quality low-dimensional representations of similarity data, but are notoriously slow to train. We propose a generic formulation of embedding algorithms that includes SNE and other existing algorithms, and study their relation with spectral methods and graph Laplacians. This allows us to define several partial-Hessian optimization strategies, characterize their global and local convergence, and evaluate them empirically. We achieve up to two orders of magnitude speedup over existing training methods with a strategy (which we call the spectral direction) that adds nearly no overhead to the gradient and yet is simple, scalable and applicable to several existing and future embedding algorithms.


Dependent Hierarchical Normalized Random Measures for Dynamic Topic Modeling

arXiv.org Machine Learning

We develop dependent hierarchical normalized random measures and apply them to dynamic topic modeling. The dependency arises via superposition, subsampling and point transition on the underlying Poisson processes of these measures. The measures used include normalised generalised Gamma processes that demonstrate power law properties, unlike Dirichlet processes used previously in dynamic topic modeling. Inference for the model includes adapting a recently developed slice sampler to directly manipulate the underlying Poisson process. Experiments performed on news, blogs, academic and Twitter collections demonstrate the technique gives superior perplexity over a number of previous models.