Goto

Collaborating Authors

 Performance Analysis


Directed Information Graphs

arXiv.org Machine Learning

We propose a graphical model for representing networks of stochastic processes, the minimal generative model graph. It is based on reduced factorizations of the joint distribution over time. We show that under appropriate conditions, it is unique and consistent with another type of graphical model, the directed information graph, which is based on a generalization of Granger causality. We demonstrate how directed information quantifies Granger causality in a particular sequential prediction setting. We also develop efficient methods to estimate the topological structure from data that obviate estimating the joint statistics. One algorithm assumes upper-bounds on the degrees and uses the minimal dimension statistics necessary. In the event that the upper-bounds are not valid, the resulting graph is nonetheless an optimal approximation. Another algorithm uses near-minimal dimension statistics when no bounds are known but the distribution satisfies a certain criterion. Analogous to how structure learning algorithms for undirected graphical models use mutual information estimates, these algorithms use directed information estimates. We characterize the sample-complexity of two plug-in directed information estimators and obtain confidence intervals. For the setting when point estimates are unreliable, we propose an algorithm that uses confidence intervals to identify the best approximation that is robust to estimation error. Lastly, we demonstrate the effectiveness of the proposed algorithms through analysis of both synthetic data and real data from the Twitter network. In the latter case, we identify which news sources influence users in the network by merely analyzing tweet times.


Surpassing Human-Level Face Verification Performance on LFW with GaussianFace

AAAI Conferences

Face verification remains a challenging problem in very complex conditions with large variations such as pose, illumination, expression, and occlusions. This problemis exacerbated when we rely unrealistically on a singletraining data source, which is often insufficient to coverthe intrinsically complex face variations. This paperproposes a principled multi-task learning approachbased on Discriminative Gaussian Process Latent VariableModel (DGPLVM), named GaussianFace, for faceverification. In contrast to relying unrealistically on asingle training data source, our model exploits additional data from multiple source-domains to improve the generalization performance of face verification inan unknown target-domain. Importantly, our model can adapt automatically to complex data distributions, and therefore can well capture complex face variations inherent in multiple sources. To enhance discriminative power, we introduced a more efficient equivalent form of Kernel Fisher Discriminant Analysis to DGPLVM.To speed up the process of inference and prediction, we exploited the low rank approximation method. Extensive experiments demonstrated the effectiveness of the proposed model in learning from diverse data sources and generalizing to unseen domains. Specifically, the accuracy of our algorithm achieved an impressive accuracyrate of 98.52% on the well-known and challenging Labeled Faces in the Wild (LFW) benchmark. For the first time, the human-level performance in face verification (97.53%) on LFW is surpassed.


Online Detection of Abnormal Events Using Incremental Coding Length

AAAI Conferences

We present an unsupervised approach for abnormal event detection in videos. We propose, given a dictionary of features learned from local spatiotemporal cuboids using the sparse coding objective, the abnormality of an event depends jointly on two factors: the frequency of each feature in reconstructing all events (or, rarity of a feature) and the strength by which it is used in reconstructing the current event (or, the absolute coefficient). The Incremental Coding Length (ICL) of a feature is a measure of its entropy gain. Given a dictionary, the ICL computation does not involve any parameter, is computationally efficient and has been used for saliency detection in images with impressive results. In this paper, the rarity of a dictionary feature is learned online as its average energy, a function of its ICL. The proposed approach is applicable to real world streaming videos. Experiments on three benchmark datasets and evaluations in comparison with a number of mainstream algorithms show that the approach is comparable to the state-of-the-art.


An Adaptive Gradient Method for Online AUC Maximization

AAAI Conferences

Learning for maximizing AUC performance is an important research problem in machine learning. Unlike traditional batch learning methods for maximizing AUC which often suffer from poor scalability, recent years have witnessed some emerging studies that attempt to maximize AUC by single-pass online learning approaches. Despite their encouraging results reported, the existing online AUC maximization algorithms often adopt simple stochastic gradient descent approaches, which fail to exploit the geometry knowledge of the data observed in the online learning process, and thus could suffer from relatively slow convergence. To overcome the limitation of the existing studies, in this paper, we propose a novel algorithm of Adaptive Online AUC Maximization (AdaOAM), by applying an adaptive gradient method for exploiting the knowledge of historical gradients to perform more informative online learning. The new adaptive updating strategy by AdaOAM is less sensitive to parameter settings due to its natural effect of tuning the learning rate. In addition, the time complexity of the new algorithm remains the same as the previous non-adaptive algorithms. To demonstrate the effectiveness of the proposed algorithm, we analyze its theoretical bound, and further evaluate its empirical performance on both public benchmark datasets and anomaly detection datasets. The encouraging empirical results clearly show the effectiveness and efficiency of the proposed algorithm.


Recommending Positive Links in Signed Social Networks by Optimizing a Generalized AUC

AAAI Conferences

With the rapid development of signed social networks in which therelationships between two nodes can be either positive (indicatingrelations such as like) or negative (indicating relations such asdislike), producing a personalized ranking list with positive linkson the top and negative links at the bottom is becoming anincreasingly important task. To accomplish it, we propose ageneralized AUC (GAUC) to quantify the ranking performance ofpotential links (including positive, negative, and unknown statuslinks) in partially observed signed social networks. In addition, wedevelop a novel link recommendation algorithm by directly optimizingthe GAUC loss. We conduct experimental studies based upon Wikipedia,MovieLens, and Slashdot; our results demonstrate the effectivenessand the efficiency of the proposed approach.


Robust Subspace Clustering via Thresholding Ridge Regression

AAAI Conferences

Given a data set from a union of multiple linear subspaces, a robust subspace clustering algorithm fits each group of data points with a low-dimensional subspace and then clusters these data even though they are grossly corrupted or sampled from the union of dependent subspaces. Under the framework of spectral clustering, recent works using sparse representation, low rank representation and their extensions achieve robust clustering results by formulating the errors (e.g., corruptions) into their objective functions so that the errors can be removed from the inputs. However, these approaches have suffered from the limitation that the structure of the errors should be known as the prior knowledge. In this paper, we present a new method of robust subspace clustering by eliminating the effect of the errors from the projection space (representation) rather than from the input space. We firstly prove that ell_1-, ell_2-, and ell_infty-norm-based linear projection spaces share the property of intra-subspace projection dominance, i.e., the coefficients over intra-subspace data points are larger than those over inter-subspace data points. Based on this property, we propose a robust and efficient subspace clustering algorithm, called Thresholding Ridge Regression (TRR). TRR calculates the ell2-norm-based coefficients of a given data set and performs a hard thresholding operator; and then the coefficients are used to build a similarity graph for clustering. Experimental studies show that TRR outperforms the state-of-the-art methods with respect to clustering quality, robustness, and time-saving.


Value of Information Based on Decision Robustness

AAAI Conferences

There are many criteria for measuring the value of information (VOI), each based on a different principle that is usually suitable for specific applications. We propose a new criterion for measuring the value of information, which values information that leads to robust decisions (i.e., ones that are unlikely to change due to new information). We also introduce an algorithm for Naive Bayes networks that selects features with maximal VOI under the new criteria. We discuss the application of the new criteria to classification tasks, showing how it can be used to tradeoff the budget, allotted for acquiring information, with the classification accuracy. In particular, we show empirically that the new criteria can reduce the expended budget significantly while reducing the classification accuracy only slightly. We also show empirically that the new criterion leads to decisions that are much more robust than those based on traditional VOI criteria, such as information gain and classification loss. This make the new criteria particularly suitable for certain decision making applications.


Online Boosting Algorithms for Anytime Transfer and Multitask Learning

AAAI Conferences

The related problems of transfer learning and multitask learning have attracted significant attention, generating a rich literature of models and algorithms. Yet most existing approaches are studied in an offline fashion, implicitly assuming that data from different domains are given as a batch. Such an assumption is not valid in many real-world applications where data samples arrive sequentially, and one wants a good learner even from few examples. The goal of our work is to provide sound extensions to existing transfer and multitask learning algorithms such that they can be used in an anytime setting. More specifically, we propose two novel online boosting algorithms, one for transfer learning and one for multitask learning, both designed to leverage the knowledge of instances in other domains. The experimental results show state-of-the-art empirical performance on standard benchmarks, and we present results of using our methods for effectively detecting new seizures in patients with epilepsy from very few previous samples.


Adaptive Sampling with Optimal Cost for Class-Imbalance Learning

AAAI Conferences

Learning from imbalanced data sets is one of the challenging problems in machine learning, which means the number of negative examples is far more than that of positive examples. The main problems of existing methods are: (1) The degree of re-sampling, a key factor greatly affecting performance, needs to be pre-fixed, which is difficult to make the optimal choice; (2) Many useful negative samples are discarded in under-sampling; (3) The effectiveness of algorithm-level methods are limited because they just use the original training data for single classifier. To address the above issues, a novel approach of adaptive sampling with optimal cost is proposed for class-imbalance learning in this paper. The novelty of the proposed approach mainly lies in: adaptively over-sampling the minority positive examples and under-sampling the majority negative examples, forming different sub-classifiers by different subsets of training data with the best cost ratio adaptively chosen, and combining these sub-classifiers according to their accuracy to create a strong classifier. It aims to make full use of the whole training data and improve the performance of class-imbalance learning classifier. The solid experiments are conducted to compare the performance between the proposed approach and 12 state-of-the-art methods on challenging 16 UCI data sets on 3 evaluation metrics, and the results show the proposed approach can achieve superior performance in class-imbalance learning.


Detecting Change Points in the Large-Scale Structure of Evolving Networks

AAAI Conferences

Interactions among people or objects are often dynamic in nature and can be represented as a sequence of networks, each providing a snapshot of the interactions over a brief period of time. An important task in analyzing such evolving networks is change-point detection, in which we both identify the times at which the large-scale pattern of interactions changes fundamentally and quantify how large and what kind of change occurred. Here, we formalize for the first time the network change-point detection problem within an online probabilistic learning framework and introduce a method that can reliably solve it. This method combines a generalized hierarchical random graph model with a Bayesian hypothesis test to quantitatively determine if, when, and precisely how a change point has occurred. We analyze the detectability of our method using synthetic data with known change points of different types and magnitudes, and show that this method is more accurate than several previously used alternatives. Applied to two high-resolution evolving social networks, this method identifies a sequence of change points that align with known external ``shocks'' to these networks.