Accuracy
Solving the Partial Label Learning Problem: An Instance-Based Approach
Zhang, Min-Ling (Southeast University) | Yu, Fei (Southeast University)
In partial label learning, each training example is associated with a set of candidate labels, among which only one is valid. An intuitive strategy to learn from partial label examples is to treat all candidate labels equally and make prediction by averaging their modeling outputs. Nonetheless, this strategy may suffer from the problem that the modeling output from the valid label is overwhelmed by those from the false positive labels. In this paper, an instance-based approach named IPAL is proposed by directly disambiguating the candidate label set. Briefly, IPAL tries to identify the valid label of each partial label example via an iterative label propagation procedure, and then classifies the unseen instance based on minimum error reconstruction from its nearest neighbors. Extensive experiments show that IPAL compares favorably against the existing instance-based as well as other state-of-the-art partial label learning approaches.
A Geometric Theory of Feature Selection and Distance-Based Measures
Shin, Kilho (University of Hyogo) | Angulo, Adrian Pino (University of Hyogo)
Feature selection measures are often explained by the analogy to a rule to measure the “distance” of sets of features to the “closest” ideal sets of features. An ideal feature set is such that it can determine classes uniquely and correctly. This way of explanation was just an analogy before this paper. In this paper, we show a way to map arbitrary feature sets of datasets into a common metric space, which is indexed by a real number p with 1 ≤ p ≤ ∞. Since this determines the distance between an arbitrary pair of feature sets, even if they belong to different datasets, the distance of a feature set to the closest ideal feature set can be used as a feature selection measure. Surprisingly, when p = 1, the measure is identical to the Bayesian risk, which is probably the feature selection measure that is used the most widely in the literature. For 1 < p ≤ ∞, the measure is novel and has significantly different properties from the Bayesian risk. We also investigate the correlation between measurements by these measures and classification accuracy through experiments. As a result, we show that our novel measures with p > 1 exhibit stronger correlation than the Bayesian risk.
Nonparametric Independence Testing for Small Sample Sizes
Ramdas, Aaditya (Carnegie Mellon University) | Wehbe, Leila (Carnegie Mellon University)
It is also useful for scientific discovery like in neuroscience, like correlation of X, Y only test for (univariate) to see if a stimulus X (say an image) is independent linear independence, natural alternatives like of the brain activity Y (say fMRI) in a relevant part of mutual information of X, Y are hard to estimate the brain. Since detecting nonlinear correlations is much easier due to a serious curse of dimensionality. A recent than estimating a nonparametric regression function (of approach, avoiding both issues, estimates norms of Y onto X), it can be done at smaller sample sizes, with further an operator in Reproducing Kernel Hilbert Spaces samples collected for estimation only if an effect is detected (RKHSs). Our main contribution is strong empirical by the hypothesis test. For such situations, correlation evidence that by employing shrunk operators only tests for univariate linear independence, while other when the sample size is small, one can attain an improvement statistics like mutual information that do characterize multivariate in power at low false positive rates. We independence are hard to estimate from data, suffering analyze the effects of Stein shrinkage on a popular from a serious curse of dimensionality. A recent popular test statistic called HSIC (Hilbert-Schmidt Independence approach for this problem (and a related two-sample testing Criterion). Our observations provide insights problem) involve the use of quantities defined in reproducing into two recently proposed shrinkage estimators, kernel Hilbert spaces (RKHSs) - see [Gretton et al., 2006; SCOSE and FCOSE - we prove that SCOSE Harchaoui et al., 2007; Gretton et al., 2005b; 2005a].
Online Robust Low Rank Matrix Recovery
Guo, Xiaojie (Chinese Academy of Sciences)
Low rank matrix recovery has shown its importance as a theoretic foundation in many areas of information processing. Its solutions are usually obtained in batch mode that requires to load all the data into memory during processing, and thus are hardly applicable on large scale data. Moreover, a fraction of data may be severely contaminated by outliers, which makes accurate recovery significantly more challenging. This paper proposes a novel online robust low rank matrix recovery method to address these difficulties. In particular, we first introduce an online algorithm to solve the problem of low rank matrix completion. Then we move on to low rank matrix recovery from observations with intensive outliers. The outlier support is robustly estimated from a perspective of mixture model. Experiments on both synthetic and real data are conducted to demonstrate the efficacy of our method and show its superior performance over the state-of-the-arts.
Optimal Bayesian Hashing for Efficient Face Recognition
Dai, Qi (Fudan University) | Li, Jianguo (Intel Corporation) | Wang, Jun (Alibaba Group) | Chen, Yurong (Intel Corporation) | Jiang, Yu-Gang (Fudan University)
In practical applications, it is often observed that high-dimensional features can yield good performance, while being more costly in both computation and storage. In this paper, we propose a novel method called Bayesian Hashing to learn an optimal Hamming embedding of high-dimensional features, with a focus on the challenging application of face recognition. In particular, a boosted random FERNs classification model is designed to perform efficient face recognition, in which bit correlations are elaborately approximated with a random permutation technique. Without incurring additional storage cost, multiple random permutations are then employed to train a series of classifiers for achieving better discrimination power. In addition, we introduce a sequential forward floating search (SFFS) algorithm to perform model selection, resulting in further performance improvement. Extensive experimental evaluations and comparative studies clearly demonstrate that the proposed Bayesian Hashing approach outperforms other peer methods in both accuracy and speed. We achieve state-of-the-art results on well-known face recognition benchmarks using compact binary codes with significantly reduced computational overload and storage cost.
Pseudo-Supervised Training Improves Unsupervised Melody Segmentation
Lattner, Stefan (Austrian Research Institute for Artificial Intelligence) | Chacón, Carlos Eduardo Cancino (Austrian Research Institute for Artificial Intelligence) | Grachten, Maarten (Austrian Research Institute for Artificial Intelligence)
An important aspect of music perception in humans is the ability to segment streams of musical events into structural units such as motifs and phrases.A promising approach to the computational modeling of music segmentation employs the statistical and information-theoretic properties of musical data, based on the hypothesis that these properties can (at least partly) account for music segmentation in humans. Prior work has shown that in particular the information content of music events, as estimated from a generative probabilistic model of those events, is a good indicator for segment boundaries.In this paper we demonstrate that, remarkably, a substantial increase in segmentation accuracy can be obtained by not using information content estimates directly, but rather in a bootstrapping fashion. More specifically, we use information content estimates computed from a generative model of the data as a target for a feed-forward neural network that is trained to estimate the information content directly from the data. We hypothesize that the improved segmentation accuracy of this bootstrapping approach may be evidence that the generative model provides noisy estimates of the information content, which are smoothed by the feed-forward neural network, yielding more accurate information content estimates.
Personalized Sentiment Classification Based on Latent Individuality of Microblog Users
Song, Kaisong (Northeastern University) | Feng, Shi (Northeastern University) | Gao, Wei (Qatar Computing Research Institute) | Wang, Daling (Northeastern University) | Yu, Ge (Northeastern University) | Wong, Kam-Fai (The Chinese University of Hong Kong)
Sentiment expression in microblog posts often reflects user's specific individuality due to different language habit, personal character, opinion bias and so on. Existing sentiment classification algorithms largely ignore such latent personal distinctions among different microblog users. Meanwhile, sentiment data of microblogs are sparse for individual users, making it infeasible to learn effective personalized classifier. In this paper, we propose a novel, extensible personalized sentiment classification method based on a variant of latent factor model to capture personal sentiment variations by mapping users and posts into a low-dimensional factor space. We alleviate the sparsity of personal texts by decomposing the posts into words which are further represented by the weighted sentiment and topic units based on a set of syntactic units of words obtained from dependency parsing results. To strengthen the representation of users, we leverage users following relation to consolidate the individuality of a user fused from other users with similar interests. Results on real-world microblog datasets confirm that our method outperforms state-of-the-art baseline algorithms with large margins.
Saliency Detection with a Deeper Investigation of Light Field
Zhang, Jun (Hefei University of Technology) | Wang, Meng (Hefei University of Technology) | Gao, Jun (Hefei University of Technology) | Wang, Yi (Hefei University of Technology) | Zhang, Xudong (Hefei University of Technology) | Wu, Xindong (Hefei University of Technology)
Although the light field has been recently recognized helpful in saliency detection, it is not comprehensively explored yet. In this work, we propose a new saliency detection model with light field data. The idea behind the proposed model originates from the following observations. (1) People can distinguish regions at different depth levels via adjusting the focus of eyes. Similarly, a light field image can generate a set of focal slices focusing at different depth levels, which suggests that a background can be weighted by selecting the corresponding slice. We show that background priors encoded by light field focusness have advantages in eliminating background distraction and enhancing the saliency by weighting the light field contrast. (2) Regions at closer depth ranges tend to be salient, while far in the distance mostly belong to the backgrounds. We show that foreground objects can be easily separated from similar or cluttered backgrounds by exploiting their light field depth. Extensive evaluations on the recently introduced Light Field Saliency Dataset (LFSD) [Li et al., 2014], including studies of different light field cues and comparisons with Li et al.'s method (the only reported light field saliency detection approach to our knowledge) and the 2D/3D state-of-the-art approaches extended with light field depth/focusness information, show that the investigated light field properties are complementary with each other and lead to improvements on 2D/3D models, and our approach produces superior results in comparison with the state-of-the-art.
Integrated Anchor and Social Link Predictions across Social Networks
Zhang, Jiawei (University of Illinois at Chicago) | Yu, Philip S. (University of Illinois at Chicago and Tsinghua University)
To enjoy more social network services, users nowadays are usually involved in multiple online social media sites at the same time. Across these social networks, users can be connected by both intra-network links (i.e., social links) and inter-network links (i.e., anchor links) simultaneously. In this paper, we want to predict the formation of social links among users in the target network as well as anchor links aligning the target network with other external social networks. The problem is formally defined as the “collective link identification” problem. To solve the collective link identification problem, a unified link prediction framework, CLF (Collective Link Fusion) is proposed in this paper, which consists of two phases: step (1) collective link prediction of anchor and social links, and step (2) propagation of predicted links across the partially aligned “probabilistic networks” with collective random walk. Extensive experiments conducted on two real-world partially aligned networks demonstrate that CLF can perform very well in predicting social and anchor links concurrently.
Inducing Probabilistic Relational Rules from Probabilistic Examples
Raedt, Luc De (KU Leuven) | Dries, Anton (KU Leuven) | Thon, Ingo (KU Leuven) | Broeck, Guy Van den (KU Leuven) | Verbeke, Mathias (KU Leuven)
We study the problem of inducing logic programs in a probabilistic setting, in which both the example descriptions and their classification can be probabilistic. The setting is incorporated in the probabilistic rule learner ProbFOIL+, which combines principles of the rule learner FOIL with ProbLog, a probabilistic Prolog. We illustrate the approach by applying it to the knowledge base of NELL, the Never-Ending Language Learner.