Goto

Collaborating Authors

 Performance Analysis


Advances in Nonparametric Hypothesis Testing

AAAI Conferences

My research goal involves simultaneously addressing statistical and computational tradeoffs encountered in modern data analysis and high-dimensional machine learning (eg: hypothesis testing, regression, classification). My future interests include incorporating additional constraints like privacy or communication, and settings involving hidden utilities of multiple cooperative agents or competitive adversaries.


Solving the Partial Label Learning Problem: An Instance-Based Approach

AAAI Conferences

In partial label learning, each training example is associated with a set of candidate labels, among which only one is valid. An intuitive strategy to learn from partial label examples is to treat all candidate labels equally and make prediction by averaging their modeling outputs. Nonetheless, this strategy may suffer from the problem that the modeling output from the valid label is overwhelmed by those from the false positive labels. In this paper, an instance-based approach named IPAL is proposed by directly disambiguating the candidate label set. Briefly, IPAL tries to identify the valid label of each partial label example via an iterative label propagation procedure, and then classifies the unseen instance based on minimum error reconstruction from its nearest neighbors. Extensive experiments show that IPAL compares favorably against the existing instance-based as well as other state-of-the-art partial label learning approaches.


A Geometric Theory of Feature Selection and Distance-Based Measures

AAAI Conferences

Feature selection measures are often explained by the analogy to a rule to measure the โ€œdistanceโ€ of sets of features to the โ€œclosestโ€ ideal sets of features. An ideal feature set is such that it can determine classes uniquely and correctly. This way of explanation was just an analogy before this paper. In this paper, we show a way to map arbitrary feature sets of datasets into a common metric space, which is indexed by a real number p with 1 โ‰ค p โ‰ค โˆž. Since this determines the distance between an arbitrary pair of feature sets, even if they belong to different datasets, the distance of a feature set to the closest ideal feature set can be used as a feature selection measure. Surprisingly, when p = 1, the measure is identical to the Bayesian risk, which is probably the feature selection measure that is used the most widely in the literature. For 1 < p โ‰ค โˆž, the measure is novel and has significantly different properties from the Bayesian risk. We also investigate the correlation between measurements by these measures and classification accuracy through experiments. As a result, we show that our novel measures with p > 1 exhibit stronger correlation than the Bayesian risk.


Nonparametric Independence Testing for Small Sample Sizes

AAAI Conferences

It is also useful for scientific discovery like in neuroscience, like correlation of X, Y only test for (univariate) to see if a stimulus X (say an image) is independent linear independence, natural alternatives like of the brain activity Y (say fMRI) in a relevant part of mutual information of X, Y are hard to estimate the brain. Since detecting nonlinear correlations is much easier due to a serious curse of dimensionality. A recent than estimating a nonparametric regression function (of approach, avoiding both issues, estimates norms of Y onto X), it can be done at smaller sample sizes, with further an operator in Reproducing Kernel Hilbert Spaces samples collected for estimation only if an effect is detected (RKHSs). Our main contribution is strong empirical by the hypothesis test. For such situations, correlation evidence that by employing shrunk operators only tests for univariate linear independence, while other when the sample size is small, one can attain an improvement statistics like mutual information that do characterize multivariate in power at low false positive rates. We independence are hard to estimate from data, suffering analyze the effects of Stein shrinkage on a popular from a serious curse of dimensionality. A recent popular test statistic called HSIC (Hilbert-Schmidt Independence approach for this problem (and a related two-sample testing Criterion). Our observations provide insights problem) involve the use of quantities defined in reproducing into two recently proposed shrinkage estimators, kernel Hilbert spaces (RKHSs) - see [Gretton et al., 2006; SCOSE and FCOSE - we prove that SCOSE Harchaoui et al., 2007; Gretton et al., 2005b; 2005a].


Fast Cross-Validation for Incremental Learning

AAAI Conferences

Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning. The general recipe for computing CV estimate is to run a learning algorithm separately for each CV fold, a computationally expensive process. In this paper, we propose a new approach to reduce the computational burden of CV-based performance estimation. As opposed to all previous attempts, which are specific to a particular learning model or problem domain, we propose a general method applicable to a large class of incremental learning algorithms, which are uniquely fitted to big data problems. In particular, our method applies to a wide range of supervised and unsupervised learning tasks with different performance criteria, as long as the base learning algorithm is incremental. We show that the running time of the algorithm scales logarithmically, rather than linearly, in the number of CV folds. Furthermore, the algorithm has favorable properties for parallel and distributed implementation. Experiments with state-of-the-art incremental learning algorithms confirm the practicality of the proposed method.


Online Robust Low Rank Matrix Recovery

AAAI Conferences

Low rank matrix recovery has shown its importance as a theoretic foundation in many areas of information processing. Its solutions are usually obtained in batch mode that requires to load all the data into memory during processing, and thus are hardly applicable on large scale data. Moreover, a fraction of data may be severely contaminated by outliers, which makes accurate recovery significantly more challenging. This paper proposes a novel online robust low rank matrix recovery method to address these difficulties. In particular, we first introduce an online algorithm to solve the problem of low rank matrix completion. Then we move on to low rank matrix recovery from observations with intensive outliers. The outlier support is robustly estimated from a perspective of mixture model. Experiments on both synthetic and real data are conducted to demonstrate the efficacy of our method and show its superior performance over the state-of-the-arts.


Optimal Bayesian Hashing for Efficient Face Recognition

AAAI Conferences

In practical applications, it is often observed that high-dimensional features can yield good performance, while being more costly in both computation and storage. In this paper, we propose a novel method called Bayesian Hashing to learn an optimal Hamming embedding of high-dimensional features, with a focus on the challenging application of face recognition. In particular, a boosted random FERNs classification model is designed to perform efficient face recognition, in which bit correlations are elaborately approximated with a random permutation technique. Without incurring additional storage cost, multiple random permutations are then employed to train a series of classifiers for achieving better discrimination power. In addition, we introduce a sequential forward floating search (SFFS) algorithm to perform model selection, resulting in further performance improvement. Extensive experimental evaluations and comparative studies clearly demonstrate that the proposed Bayesian Hashing approach outperforms other peer methods in both accuracy and speed. We achieve state-of-the-art results on well-known face recognition benchmarks using compact binary codes with significantly reduced computational overload and storage cost.


A Graph Kernel Based on the Jensen-Shannon Representation Alignment

AAAI Conferences

In this paper, we develop a novel graph kernel by aligning the Jensen-Shannon (JS) representations of vertices. We commence by describing how to compute the JS representation of a vertex by measuring the JS divergence (JSD) between the corresponding $-layer depth-based (DB) representations developed. By aligning JS representations of vertices, we identify the correspondence between the vertices of two graphs and this allows us to construct a matching-based graph kernel. Unlike existing R-convolution kernels that roughly record the isomorphism information between any pair of substructures under a type of graph decomposition, the new kernel can be seen as an aligned subgraph kernel that incorporates explicit local correspondences of substructures i.e., the local information graphs) into the process of kernelization through the JS representation alignment. The new kernel thus addresses the drawback of neglecting the relative locations between substructures that arises in the R-convolution kernels. Experiments demonstrate that our kernel can easily outperform state-of-the-art graph kernels in terms of the classification accuracies.


Pseudo-Supervised Training Improves Unsupervised Melody Segmentation

AAAI Conferences

An important aspect of music perception in humans is the ability to segment streams of musical events into structural units such as motifs and phrases.A promising approach to the computational modeling of music segmentation employs the statistical and information-theoretic properties of musical data, based on the hypothesis that these properties can (at least partly) account for music segmentation in humans. Prior work has shown that in particular the information content of music events, as estimated from a generative probabilistic model of those events, is a good indicator for segment boundaries.In this paper we demonstrate that, remarkably, a substantial increase in segmentation accuracy can be obtained by not using information content estimates directly, but rather in a bootstrapping fashion. More specifically, we use information content estimates computed from a generative model of the data as a target for a feed-forward neural network that is trained to estimate the information content directly from the data. We hypothesize that the improved segmentation accuracy of this bootstrapping approach may be evidence that the generative model provides noisy estimates of the information content, which are smoothed by the feed-forward neural network, yielding more accurate information content estimates.


Personalized Sentiment Classification Based on Latent Individuality of Microblog Users

AAAI Conferences

Sentiment expression in microblog posts often reflects user's specific individuality due to different language habit, personal character, opinion bias and so on. Existing sentiment classification algorithms largely ignore such latent personal distinctions among different microblog users. Meanwhile, sentiment data of microblogs are sparse for individual users, making it infeasible to learn effective personalized classifier. In this paper, we propose a novel, extensible personalized sentiment classification method based on a variant of latent factor model to capture personal sentiment variations by mapping users and posts into a low-dimensional factor space. We alleviate the sparsity of personal texts by decomposing the posts into words which are further represented by the weighted sentiment and topic units based on a set of syntactic units of words obtained from dependency parsing results. To strengthen the representation of users, we leverage users following relation to consolidate the individuality of a user fused from other users with similar interests. Results on real-world microblog datasets confirm that our method outperforms state-of-the-art baseline algorithms with large margins.