Statistical Learning
Shift-Pessimistic Active Learning Using Robust Bias-Aware Prediction
Liu, Anqi (University of Illinois at Chicago) | Reyzin, Lev (University of Illinois at Chicago) | Ziebart, Brian D. (University of Illinois at Chicago)
Existing approaches to active learning are generally optimistic about their certainty with respect to data shift between labeled and unlabeled data. They assume that unknown datapoint labels follow the inductive biases of the active learner. As a result, the most useful datapoint labelsโones that refute current inductive biasesโare rarely solicited. We propose a shift-pessimistic approach to active learning that assumes the worst-case about the unknown conditional label distribution. This closely aligns model uncertainty with generalization error, enabling more useful label solicitation. We investigate the theoretical benefits of this approach and demonstrate its empirical advantages on probabilistic binary classification tasks.
Sense-Aaware Semantic Analysis: A Multi-Prototype Word Representation Model Using Wikipedia
Wu, Zhaohui (The Pennsylvania State University) | Giles, C. Lee (The Pennsylvania State University)
Human languages are naturally ambiguous, which makes it difficult to automatically understand the semantics of text. Most vector space models (VSM) treat all occurrences of a word as the same and build a single vector to represent the meaning of a word, which fails to capture any ambiguity. We present sense-aware semantic analysis (SaSA), a multi-prototype VSM for word representation based on Wikipedia, which could account for homonymy and polysemy. The "sense-specific'' prototypes of a word are produced by clustering Wikipedia pages based on both local and global contexts of the word in Wikipedia. Experimental evaluations on semantic relatedness for both isolated words and words in sentential contexts and word sense induction demonstrate its effectiveness.
Low-Rank Similarity Metric Learning in High Dimensions
Liu, Wei (IBM T. J. Watson Research Center) | Mu, Cun (Columbia University) | Ji, Rongrong (Xiamen University) | Ma, Shiqian (The Chinese University of Hong Kong) | Smith, John R. (IBM T. J. Watson Research Center) | Chang, Shih-Fu (Columbia University)
Metric learning has become a widespreadly used tool in machine learning. To reduce expensive costs brought in by increasing dimensionality, low-rank metric learning arises as it can be more economical in storage and computation. However, existing low-rank metric learning algorithms usually adopt nonconvex objectives, and are hence sensitive to the choice of a heuristic low-rank basis. In this paper, we propose a novel low-rank metric learning algorithm to yield bilinear similarity functions. This algorithm scales linearly with input dimensionality in both space and time, therefore applicable to high-dimensional data domains. A convex objective free of heuristics is formulated by leveraging trace norm regularization to promote low-rankness. Crucially, we prove that all globally optimal metric solutions must retain a certain low-rank structure, which enables our algorithm to decompose the high-dimensional learning task into two steps: an SVD-based projection and a metric learning problem with reduced dimensionality. The latter step can be tackled efficiently through employing a linearized Alternating Direction Method of Multipliers. The efficacy of the proposed algorithm is demonstrated through experiments performed on four benchmark datasets with tens of thousands of dimensions.
Large Margin Metric Learning for Multi-Label Prediction
Liu, Weiwei (University of Technology, Sydney) | Tsang, Ivor W (University of Technology, Sydney)
Canonical correlation analysis (CCA) and maximum margin output coding (MMOC) methods have shown promising results for multi-label prediction, where each instance is associated with multiple labels. However, these methods require an expensive decoding procedure to recover the multiple labels of each testing instance. The testing complexity becomes unacceptable when there are many labels. To avoid decoding completely, we present a novel large margin metric learning paradigm for multi-label prediction. In particular, the proposed method learns a distance metric to discover label dependency such that instances with very different multiple labels will be moved far away. To handle many labels, we present an accelerated proximal gradient procedure to speed up the learning process. Comprehensive experiments demonstrate that our proposed method is significantly faster than CCA and MMOC in terms of both training and testing complexities. Moreover, our method achieves superior prediction performance compared with state-of-the-art methods.
Support Consistency of Direct Sparse-Change Learning in Markov Networks
Liu, Song (Tokyo Institute of Technology, Japan) | Suzuki, Taiji (Tokyo Institute of Technology, Japan) | Sugiyama, Masashi (University of Tokyo, Japan)
We study the problem of learning sparse structure changes between two Markov networks P and Q. Rather than fitting two Markov networks separately to two sets of data and figuring out their differences, a recent work proposed to learn changes directly via estimating the ratio between two Markov network models. ย Such a direct approach was demonstrated to perform excellently in experiments, although its theoretical properties remained unexplored. ย In this paper, we give sufficient conditions for successful change detection with respect to the sample size np, nq, the dimension of data m, and the number of changed edges d.
Fast Gradient Descent for Drifting Least Squares Regression, with Application to Bandits
Korda, Nathaniel (University of Oxford) | L.A., Prashanth (INRIA Lille) | Munos, Remi (INRIA Lille)
Online learning algorithms require to often recompute least squares regression estimates of parameters. We study improving the computational complexity of such algorithms by using stochastic gradient descent (SGD) type schemes in place of classic regression solvers. We show that SGD schemes efficiently track the true solutions of the regression problems, even in the presence of a drift. This finding coupled with an $O(d)$ improvement in complexity, where $d$ is the dimension of the data, make them attractive for implementation in the \textit{big data} settings. In the case when strong convexity in the regression problem is guaranteed, we provide bounds on the error both in expectation and high probability (the latter is often needed to provide theoretical guarantees for higher level algorithms), despite the drifting least squares solution. As an example of this case we prove that the regret performance of an SGD version of the PEGE linear bandit algorithm is worse than that of PEGE itself only by a factor of $O(\log^4 n)$. When strong convexity of the regression problem cannot be guaranteed, we investigate using an adaptive regularisation. We make an empirical study of an adaptively regularised, SGD version of LinUCB in a news article recommendation application, which uses the large scale news recommendation dataset from Yahoo! front page. These experiments show a large gain in computational complexity and a consistently low tracking error.
FACES: Diversity-Aware Entity Summarization Using Incremental Hierarchical Conceptual Clustering
Gunaratna, Kalpa (Kno.e.sis, Wright State University) | Thirunarayan, Krishnaparasad (Kno.e.sis, Wright State University) | Sheth, Amit (Kno.e.sis, Wright State University)
Semantic Web documents that encode facts about entities on the Web have been growing rapidly in size and evolving over time. Creating summaries on lengthy Semantic Web documents for quick identification of the corresponding entity has been of great contemporary interest. In this paper, we explore automatic summarization techniques that characterize and enable identification of an entity and create summaries that are human friendly. Specifically, we highlight the importance of diversified (faceted) summaries by combining three dimensions: diversity, uniqueness, and popularity. Our novel diversity-aware entity summarization approach mimics human conceptual clustering techniques to group facts and picks representative facts from each group to form concise (i.e., short) and comprehensive (i.e., improved coverage through diversity) summaries. We evaluate our approach against the state-of-the-art techniques and show that our work improves both the quality and the efficiency of entity summarization.
Metric Learning Driven Multi-Task Structured Output Optimization for Robust Keypoint Tracking
Zhao, Liming (Zhejiang University) | Li, Xi (Zhejiang University) | Xiao, Jun (Zhejiang University) | Wu, Fei (Zhejiang University) | Zhuang, Yueting (Zhejiang University)
As an important and challenging problem in computer vision and graphics, keypoint-based object tracking is typically formulated in a spatio-temporal statistical learning framework. However, most existing keypoint trackers are incapable of effectively modeling and balancing the following three aspects in a simultaneous manner: temporal model coherence across frames, spatial model consistency within frames, and discriminative feature construction. To address this issue, we propose a robust keypoint tracker based on spatio-temporal multi-task structured output optimization driven by discriminative metric learning. Consequently, temporal model coherence is characterized by multi-task structured keypoint model learning over several adjacent frames, while spatial model consistency is modeled by solving a geometric verification based structured learning problem. Discriminative feature construction is enabled by metric learning to ensure the intra-class compactness and inter-class separability. Finally, the above three modules are simultaneously optimized in a joint learning scheme. Experimental results have demonstrated the effectiveness of our tracker.
Surpassing Human-Level Face Verification Performance on LFW with GaussianFace
Lu, Chaochao (The Chinese University of Hong Kong) | Tang, Xiaoou (The Chinese University of Hong Kong)
Face verification remains a challenging problem in very complex conditions with large variations such as pose, illumination, expression, and occlusions. This problemis exacerbated when we rely unrealistically on a singletraining data source, which is often insufficient to coverthe intrinsically complex face variations. This paperproposes a principled multi-task learning approachbased on Discriminative Gaussian Process Latent VariableModel (DGPLVM), named GaussianFace, for faceverification. In contrast to relying unrealistically on asingle training data source, our model exploits additional data from multiple source-domains to improve the generalization performance of face verification inan unknown target-domain. Importantly, our model can adapt automatically to complex data distributions, and therefore can well capture complex face variations inherent in multiple sources. To enhance discriminative power, we introduced a more efficient equivalent form of Kernel Fisher Discriminant Analysis to DGPLVM.To speed up the process of inference and prediction, we exploited the low rank approximation method. Extensive experiments demonstrated the effectiveness of the proposed model in learning from diverse data sources and generalizing to unseen domains. Specifically, the accuracy of our algorithm achieved an impressive accuracyrate of 98.52% on the well-known and challenging Labeled Faces in the Wild (LFW) benchmark. For the first time, the human-level performance in face verification (97.53%) on LFW is surpassed.
Exploring Semantic Inter-Class Relationships (SIR) for Zero-Shot Action Recognition
Gan, Chuang (Tsinghua University) | Lin, Ming (Carnegie Mellon University) | Yang, Yi (University of Technology Sydney) | Zhuang, Yueting (Zhejiang University) | G.Hauptmann, Alexander (Carnegie Mellon University)
Automatically recognizing a large number of action categories from videos is of significant importance for video understanding. Most existing works focused on the design of more discriminative feature representation, and have achieved promising results when the positive samples are enough. However, very limited efforts were spent on recognizing a novel action without any positive exemplars, which is often the case in the real settings due to the large amount of action classes and the users' queries dramatic variations. To address this issue, we propose to perform action recognition when no positive exemplars of that class are provided, which is often known as the zero-shot learning. Different from other zero-shot learning approaches, which exploit attributes as the intermediate layer for the knowledge transfer, our main contribution is SIR, which directly leverages the semantic inter-class relationships between the known and unknown actions followed by label transfer learning. The inter-class semantic relationships are automatically measured by continuous word vectors, which learned by the skip-gram model using the large-scale text corpus. Extensive experiments on the UCF101 dataset validate the superiority of our method over fully-supervised approaches using few positive exemplars.