Goto

Collaborating Authors

 Europe


Robust Distance Metric Learning with Auxiliary Knowledge

AAAI Conferences

Most of the existing metric learning methods are accomplished by exploiting pairwise constraints over the labeled data and frequently suffer from the insufficiency of training examples.  To learn a robust distance metric from few labeled examples, prior knowledge from unlabeled examples as well as the metrics previously derived from auxiliary data sets can be useful.  In this paper, we propose to leverage such auxiliary knowledge to assist distance metric learning, which is formulated following the regularized loss minimization principle.  Two algorithms are derived on the basis of manifold regularization and log-determinant divergence regularization technique, respectively, which can simultaneously exploit label information (i.e., the pairwise constraints over labeled data), unlabeled examples, and the metrics derived from auxiliary data sets.  The proposed methods directly manipulate the auxiliary metrics and require no raw examples from the auxiliary data sets, which make them efficient and flexible.  We conduct extensive evaluations to compare our approaches with a number of competing approaches on face recognition task.  The experimental results show that our approaches can derive reliable distance metrics from limited training examples and thus are superior in terms of accuracy and labeling efforts.


Spatio-Temporal Event Detection Using Dynamic Conditional Random Fields

AAAI Conferences

Event detection is a critical task in sensor networks for a variety of real-world applications. Many real-world events often exhibit complex spatio-temporal patterns whereby they manifest themselves via observations over time and space proximities. These spatio-temporal events cannot be handled well by many of the previous approaches. In this paper, we propose a new Spatio-Temporal Event Detection (STED) algorithm in sensor networks based on a dynamic conditional random field (DCRF) model. Our STED method handles the uncertainty of sensor data explicitly and permits neighborhood interactions in both observations and event labels. Experiments on both real data and synthetic data demonstrate that our STED method can provide accurate event detection in near real time even for large-scale sensor networks.


Multi-Relational Learning with Gaussian Processes

AAAI Conferences

Due to their flexible nonparametric nature, Gaussian process models are very effective at solving hard machine learning problems. While existing Gaussian process models focus on modeling one single relation, we present a generalized GP model, named multi-relational Gaussian process model, that is able to deal with an arbitrary number of relations in a domain of interest. The proposed model is analyzed in the context of bipartite, directed, and undirected univariate relations. Experimental results on real-world datasets show that exploiting the correlations among different entity types and relations can indeed improve prediction performance.


Discriminative Semi-Supervised Feature Selection via Manifold Regularization

AAAI Conferences

Feature selection can be conducted in a supervised or unsupervised manner, in terms of whether the label information We consider the problem of semi-supervised feature is utilized to guide the selection of relevant features. Generally, selection, where we are given a small amount supervised feature selection methods require a large of labeled examples and a large amount of unlabeled amount of labeled training data. It however could fail to identify examples. Since a small number of labeled the relevant features that are discriminative to different samples are usually insufficient for identifying the classes, provided the number of labeled samples is small. On relevant features, the critical problem arising from the other hand, while unsupervised feature selection methods semi-supervised feature selection is how to take could work well with unlabeled training data, they ignore advantage of the information underneath the unlabeled the label information and therefore are often unable to identify data. To address this problem, we propose the discriminative features. Given the high cost in manually a novel discriminative semi-supervised feature labeling data, and at the same time abundant unlabeled selection method based on the idea of manifold data are often easily accessible, it is desirable to develop feature regularization. The proposed method selects selection methods that are capable of exploiting both labeled features through maximizing the classification margin and unlabeled data.


Toward Unsupervised Activity Discovery Using Multi Dimensional Motif Detection in Time Series

AAAI Conferences

This paper addresses the problem of activity and event discovery in multi dimensional time series data by proposing a novel method for locating multi dimensional motifs in time series. While recent work has been done in finding single dimensional and multi dimensional motifs in time series, we address motifs in general case, where the elements of multi dimensional motifs have temporal, length, and frequency variations. The proposed method is validated by synthetic data, and empirical evaluation has been done on several wearable systems that are used by real subjects.


Succinct Approximate Counting of Skewed Data

AAAI Conferences

Practical data analysis relies on the ability to count observations of objectssuccinctly and efficiently. Unfortunately the space usage of an exact estimator grows with the size of the a priori set from which objects are drawn while the time required to maintain such an estimator grows with the size of the data set. We present static and on-line approximation schemes that avoid these limitations when approximate frequency estimates are acceptable. Our Log-Frequency Sketch extends the approximate counting algorithm of Morris [Morris1978] to estimate frequencies with bounded relative error via a single pass over a data set. It uses constant space per object when the frequencies follow a power law and can be maintained in constant time per observation. We give an (epsilon, delta)-approximation scheme which we verify empirically on a large natural language data set where, for instance, 95 percent of frequencies are estimated with relative error less than 0.25 using fewer than 11 bits per object in the static case and 15 bits per object on-line.


Latent Variable Perceptron Algorithm for Structured Classification

AAAI Conferences

We propose a perceptron-style algorithm for fast discriminative training of structured latent variable model. This method extends the perceptron algorithm for the learning with latent dependencies, as an alternative to existing probabilistic latent variable models. It relies on Viterbi decoding over latent variables, combined with simple additive updates. Its training cost is significantly lower than that of probabilistic latent variable models, while it gives comparable or even superior classification accuracy on our tasks. Experiments on natural language processing problems demonstrate that its results are among those good reports on corresponding data sets.


On the Equivalence Between Canonical Correlation Analysis and Orthonormalized Partial Least Squares

AAAI Conferences

Canonical correlation analysis (CCA) and partial least squares (PLS) are well-known techniques for feature extraction from two sets of multi-dimensional variables. The fundamental difference between CCA and PLS is that CCA maximizes the correlation while PLS maximizes the covariance. Although both CCA and PLS have been applied successfully in various applications, the intrinsic relationship between them remains unclear. In this paper, we attempt to address this issue by showing the equivalence relationship between CCA and orthonormalized partial least squares (OPLS), a variant of PLS. We further extend the equivalence relationship to the case when regularization is employed for both sets of variables. In addition, we show that the CCA projection for one set of variables is independent of the regularization on the other set of variables. We have performed experimental studies using both synthetic and real data sets and our results confirm the established equivalence relationship. The presented analysis provides novel insights into the connection between these two existing algorithms as well as the effect of the regularization.


Predictive Projections

AAAI Conferences

These existing algorithms discover projections policies in very high dimensional state spaces. of the training data under which nearby points are likely We propose a linear dimensionality reduction algorithm to have the same class label or similar regression targets. The that discovers predictive projections: projections algorithm described in this paper makes use of the same machinery in which accurate predictions of future states but attempts to find low-dimensional projections under can be made using simple nearest neighbor style which current state vectors accurately predict future states learning. The goal of this work is to extend the in the projected space. The intuition is that projections which reach of existing reinforcement learning algorithms capture the state dynamics in this way are likely to contain to domains where they would otherwise be inapplicable information that will be useful for control.


Streamed Learning: One-Pass SVMs

AAAI Conferences

We present a streaming model for large-scale classification (in the context of ℓ2 -SVM) by leveraging connections between learning and computational geometry. The streaming model imposes the constraint that only a single pass over the data is allowed. The ℓ2 -SVM is known to have an equivalent formulation in terms of the minimum enclosing ball (MEB) problem, and an efficient algorithm based on the idea of core sets exists (CVM) [Tsang et al., 2005]. CVM learns a (1 + ε)-approximate MEB for a set of points and yields an approximate solution to corresponding SVM instance. However CVM works in batch mode requiring multiple passes over the data. This paper presents a single-pass SVM which is based on the minimum enclosing ball of streaming data. We show that the MEB updates for the streaming case can be easily adapted to learn the SVM weight vector in a way similar to using online stochastic gradient updates. Our algorithm performs polylogarithmic computation at each example, and requires very small and constant storage. Experimental results show that, even in such restrictive settings, we can learn efficiently in just one pass and get accuracies comparable to other state-of-the-art SVM solvers (batch and online). We also give an analysis of the algorithm, and discuss some open issues and possible extensions.