Goto

Collaborating Authors

 Inductive Learning


Top Rank Optimization in Linear Time

arXiv.org Artificial Intelligence

Bipartite ranking aims to learn a real-valued ranking function that orders positive instances before negative instances. Recent efforts of bipartite ranking are focused on optimizing ranking accuracy at the top of the ranked list. Most existing approaches are either to optimize task specific metrics or to extend the ranking loss by emphasizing more on the error associated with the top ranked instances, leading to a high computational cost that is super-linear in the number of training instances. We propose a highly efficient approach, titled TopPush, for optimizing accuracy at the top that has computational complexity linear in the number of training instances. We present a novel analysis that bounds the generalization error for the top ranked instances for the proposed approach. Empirical study shows that the proposed approach is highly competitive to the state-of-the-art approaches and is 10-100 times faster.


Tight Error Bounds for Structured Prediction

arXiv.org Machine Learning

Structured prediction tasks in machine learning involve the simultaneous prediction of multiple labels. This is typically done by maximizing a score function on the space of labels, which decomposes as a sum of pairwise elements, each depending on two specific labels. Intuitively, the more pairwise terms are used, the better the expected accuracy. However, there is currently no theoretical account of this intuition. This paper takes a significant step in this direction. We formulate the problem as classifying the vertices of a known graph $G=(V,E)$, where the vertices and edges of the graph are labelled and correlate semi-randomly with the ground truth. We show that the prospects for achieving low expected Hamming error depend on the structure of the graph $G$ in interesting ways. For example, if $G$ is a very poor expander, like a path, then large expected Hamming error is inevitable. Our main positive result shows that, for a wide class of graphs including 2D grid graphs common in machine vision applications, there is a polynomial-time algorithm with small and information-theoretically near-optimal expected error. Our results provide a first step toward a theoretical justification for the empirical success of the efficient approximate inference algorithms that are used for structured prediction in models where exact inference is intractable.


Ensembles of Random Sphere Cover Classifiers

arXiv.org Artificial Intelligence

We propose and evaluate alternative ensemble schemes for a new instance based learning classifier, the Randomised Sphere Cover (RSC) classifier. RSC fuses instances into spheres, then bases classification on distance to spheres rather than distance to instances. The randomised nature of RSC makes it ideal for use in ensembles. We propose two ensemble methods tailored to the RSC classifier; $\alpha \beta$RSE, an ensemble based on instance resampling and $\alpha$RSSE, a subspace ensemble. We compare $\alpha \beta$RSE and $\alpha$RSSE to tree based ensembles on a set of UCI datasets and demonstrates that RSC ensembles perform significantly better than some of these ensembles, and not significantly worse than the others. We demonstrate via a case study on six gene expression data sets that $\alpha$RSSE can outperform other subspace ensemble methods on high dimensional data when used in conjunction with an attribute filter. Finally, we perform a set of Bias/Variance decomposition experiments to analyse the source of improvement in comparison to a base classifier.


ICE: Enabling Non-Experts to Build Models Interactively for Large-Scale Lopsided Problems

arXiv.org Artificial Intelligence

Quick interaction between a human teacher and a learning machine presents numerous benefits and challenges when working with web-scale data. The human teacher guides the machine towards accomplishing the task of interest. The learning machine leverages big data to find examples that maximize the training value of its interaction with the teacher. When the teacher is restricted to labeling examples selected by the machine, this problem is an instance of active learning. When the teacher can provide additional information to the machine (e.g., suggestions on what examples or predictive features should be used) as the learning task progresses, then the problem becomes one of interactive learning. To accommodate the two-way communication channel needed for efficient interactive learning, the teacher and the machine need an environment that supports an interaction language. The machine can access, process, and summarize more examples than the teacher can see in a lifetime. Based on the machine's output, the teacher can revise the definition of the task or make it more precise. Both the teacher and the machine continuously learn and benefit from the interaction. We have built a platform to (1) produce valuable and deployable models and (2) support research on both the machine learning and user interface challenges of the interactive learning problem. The platform relies on a dedicated, low-latency, distributed, in-memory architecture that allows us to construct web-scale learning machines with quick interaction speed. The purpose of this paper is to describe this architecture and demonstrate how it supports our research efforts. Preliminary results are presented as illustrations of the architecture but are not the primary focus of the paper.


Marginal Structured SVM with Hidden Variables

arXiv.org Machine Learning

In this work, we propose the marginal structured SVM (MSSVM) for structured prediction with hidden variables. MSSVM properly accounts for the uncertainty of hidden variables, and can significantly outperform the previously proposed latent structured SVM (LSSVM; Yu & Joachims (2009)) and other state-of-art methods, especially when that uncertainty is large. Our method also results in a smoother objective function, making gradient-based optimization of MSSVMs converge significantly faster than for LSSVMs. We also show that our method consistently outperforms hidden conditional random fields (HCRFs; Quattoni et al. (2007)) on both simulated and real-world datasets. Furthermore, we propose a unified framework that includes both our and several other existing methods as special cases, and provides insights into the comparison of different models in practice.


Conditional Probability Tree Estimation Analysis and Algorithms

arXiv.org Machine Learning

We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably constructs a logarithmic depth tree on the set of labels to solve this problem. We test the algorithm empirically, showing that it works succesfully on a dataset with roughly 106 labels.


Dynamic Feature Scaling for Online Learning of Binary Classifiers

arXiv.org Machine Learning

Scaling feature values is an important step in numerous machine learning tasks. Different features can have different value ranges and some form of a feature scaling is often required in order to learn an accurate classifier. However, feature scaling is conducted as a preprocessing task prior to learning. This is problematic in an online setting because of two reasons. First, it might not be possible to accurately determine the value range of a feature at the initial stages of learning when we have observed only a few number of training instances. Second, the distribution of data can change over the time, which render obsolete any feature scaling that we perform in a pre-processing step. We propose a simple but an effective method to dynamically scale features at train time, thereby quickly adapting to any changes in the data stream. We compare the proposed dynamic feature scaling method against more complex methods for estimating scaling parameters using several benchmark datasets for binary classification. Our proposed feature scaling method consistently outperforms more complex methods on all of the benchmark datasets and improves classification accuracy of a state-of-the-art online binary classifier algorithm.


PolicyBoost: Functional Policy Gradient with Ranking-based Reward Objective

AAAI Conferences

Learning policies in nonlinear representations is an important step toward real-world applications of reinforcement learning in robotics. While functional representation has been widely applied in state-of-the-art supervised learning techniques (as known as boosting approaches) to adaptively learn nonlinear functions, in reinforcement learning the boosting-style approaches have been little investigated. Only a few pieces of work explored in this direction, which however may suffer from the occurring-probability-pursuing problem. In this paper, to alleviate the problem, we propose to employ a ranking-based objective function to guide the policy search in a function space, resulting in the PolicyBoost approach. Experiment results verify the effectiveness as well as the robustness of the PolicyBoost.


On Boosting Sparse Parities

AAAI Conferences

While boosting has been extensively studied, considerablyless attention has been devoted to the task of designing good weaklearning algorithms. In this paper we consider the problem of designing weak learners thatare especially adept to the boosting procedure and specifically the AdaBoost algorithm. First we describe conditions desirable for a weak learning algorithm. We then propose using sparse parity functions as weak learners, which have many of our desired properties, as weak learners in boosting. Our experimental tests show the proposed weak learners tobe competitive with the most widely used ones: decisionstumps and pruned decision trees.


Learning to Recognize Novel Objects in One Shot through Human-Robot Interactions in Natural Language Dialogues

AAAI Conferences

Being able to quickly and naturally teach robots new knowledge is critical for many future open-world human-robot interaction scenarios. In this paper we present a novel approach to using natural language context for one-shot learning of visual objects, where the robot is immediately able to recognize the described object. We describe the architectural components and demonstrate the proposed approach on a robotic platform in a proof-of-concept evaluation.