Statistical Learning
Labeling Complicated Objects: Multi-View Multi-Instance Multi-Label Learning
Nguyen, Cam-Tu (Nanjing University) | Wang, Xiaoliang (Nanjing University) | Liu, Jing (Institute of Automation Chinese Academy of Sciences) | Zhou, Zhi-Hua (Nanjing University)
Multi-Instance Multi-Label (MIML) is a learning framework where an example is associated with multiple labels and represented by a set of feature vectors (multiple instances). In the formalization of MIML learning, instances come from a single source (single view). To leverage multiple information sources (multi-view), we develop a multi-view MIML framework based on hierarchical Bayesian Network, and derive an effective learning algorithm based on variational inference. The model can naturally deal with examples in which some views could be absent (partial examples). On multi-view datasets, it is shown that our method is better than other multi-view and single-view approaches particularly in the presence of partial examples. On single-view benchmarks, extensive evaluation shows that our method is highly competitive or better than other MIML approaches on labeling examples and instances. Moreover, our method can effectively handle datasets with a large number of labels.
Lifetime Lexical Variation in Social Media
Liao, Lizi (Beijing Institute of Technology) | Jiang, Jing (Singapore Management University) | Ding, Ying (Singapore Management University) | Huang, Heyan (Beijing Institute of Technology) | Lim, Ee-Peng (Singapore Management University)
As the rapid growth of online social media attracts a large number of Internet users, the large volume of content generated by these users also provides us with an opportunity to study the lexical variation of people of different ages. In this paper, we present a latent variable model that jointly models the lexical content of tweets and Twitter usersโ ages. Our model inherently assumes that a topic has not only a word distribution but also an age distribution. We propose a Gibbs-EM algorithm to perform inference on our model. Empirical evaluation shows that our model can learn meaningful age-specific topics such as โschoolโ for teenagers and โhealthโ for older people. Our model can also be used for age prediction and performs better than a number of baseline methods.
On Dataless Hierarchical Text Classification
Song, Yangqiu (University of Illinois at Urbana-Champaign) | Roth, Dan (University of Illinois at Urbana-Champaign)
In this paper, we systematically study the problem of dataless hierarchical text classification. Unlike standard text classification schemes that rely on supervised training, dataless classification depends on understanding the labels of the sought after categories and requires no labeled data. Given a collection of text documents and a set of labels, we show that understanding the labels can be used to accurately categorize the documents. This is done by embedding both labels and documents in a semantic space that allows one to compute meaningful semantic similarity between a document and a potential label. We show that this scheme can be used to support accurate multiclass classification without any supervision. We study several semantic representations and show how to improve the classification using bootstrapping. Our results show that bootstrapped dataless classification is competitive with supervised classification with thousands of labeled examples.
How Do Your Friends on Social Media Disclose Your Emotions?
Yang, Yang (Tsinghua University) | Jia, Jia (Tsinghua University) | Zhang, Shumei (Tsinghua University) | Wu, Boya (Tsinghua University) | Chen, Qicong (Tsinghua University) | Li, Juanzi (Tsinghua University) | Xing, Chunxiao (Tsinghua University) | Tang, Jie (Tsinghua University)
Extracting emotions from images has attracted much interest, in particular with the rapid development of social networks. The emotional impact is very important for understanding the intrinsic meanings of images. Despite many studies having been done, most existing methods focus on image content, but ignore the emotion of the user who published the image. One interesting question is: How does social effect correlate with the emotion expressed in an image? Specifically, can we leverage friends interactions (e.g., discussions) related to an image to help extract the emotions? In this paper, we formally formalize the problem and propose a novel emotion learning method by jointly modeling images posted by social users and comments added by their friends. One advantage of the model is that it can distinguish those comments that are closely related to the emotion expression for an image from the other irrelevant ones. Experiments on an open Flickr dataset show that the proposed model can significantly improve (+37.4% by F1) the accuracy for inferring user emotions. More interestingly, we found that half of the improvements are due to interactions between 1.0% of the closest friends.
Growing Regression Forests by Classification: Applications to Object Pose Estimation
In this work, we propose a novel node splitting method for regression trees and incorporate it into the regression forest framework. Unlike traditional binary splitting, where the splitting rule is selected from a predefined set of binary splitting rules via trial-and-error, the proposed node splitting method first finds clusters of the training data which at least locally minimize the empirical loss without considering the input space. Then splitting rules which preserve the found clusters as much as possible are determined by casting the problem into a classification problem. Consequently, our new node splitting method enjoys more freedom in choosing the splitting rules, resulting in more efficient tree structures. In addition to the Euclidean target space, we present a variant which can naturally deal with a circular target space by the proper use of circular statistics. We apply the regression forest employing our node splitting to head pose estimation (Euclidean target space) and car direction estimation (circular target space) and demonstrate that the proposed method significantly outperforms state-of-the-art methods (38.5% and 22.5% error reduction respectively).
HC-Search for Multi-Label Prediction: An Empirical Study
Doppa, Janardhan Rao (Oregon State University) | Yu, Jun (Oregon State University) | Ma, Chao (Oregon State University) | Fern, Alan (Oregon State University) | Tadepalli, Prasad (Oregon State University)
Multi-label learning concerns learning multiple, overlapping, and correlated classes. In this paper, we adapt a recent structured prediction framework called HC-Search for multi-label prediction problems. One of the main advantages of this framework is that its training is sensitive to the loss function, unlike the other multi-label approaches that either assume a specific loss function or require a manual adaptation to each loss function. We empirically evaluate our instantiation of the HC-Search framework along with many existing multi-label learning algorithms on a variety of benchmarks by employing diverse task loss functions. Our results demonstrate that the performance of existing algorithms tends to be very similar in most cases, and that the HC-Search approach is comparable and often better than all the other algorithms across different loss functions.
Fast Algorithm for Non-Stationary Gaussian Process Prediction
Zhang, Yulai (Tsinghua University) | Luo, Guiming (Tsinghua University)
Algorithm's time complexity is an essential issue for time series prediction in numerous practices.A novel fast exact inference method for Gaussian process model is proposed in this paper to accelerate the task of non-stationary time series prediction. Experiment was done on the real world power load data.
Data Clustering by Laplacian Regularized L1-Graph
Yang, Yingzhen (University of Illinois at Urbana-Champaign) | Wang, Zhangyang (University of Illinois at Urbana-Champaign) | Yang, Jianchao (Adobe Research) | Wang, Jiangping (University of Illinois at Urbana-Champaign) | Chang, Shiyu (University of Illinois at Urbana-Champaign) | Huang, Thomas S (University of Illinois at Urbana-Champaign)
L1-Graph has been proven to be effective in data clustering, which partitions the data space by using the sparse representation of the data as the similarity measure. However, the sparse representation is performed for each datum separately without taking into account the geometric structure of the data. Motivated by L1-Graph and manifold leaning, we propose Laplacian Regularized L1-Graph (LRโ1-Graph) for data clustering. The sparse representations of LRโ1-Graph are regularized by the geometric information of the data so that they vary smoothly along the geodesics of the data manifold by the graph Laplacian according to the manifold assumption. Moreover, we propose an iterative regularization scheme, where the sparse representation obtained from the previous iteration is used to build the graph Laplacian for the current iteration of regularization. The experimental results on real data sets demonstrate the superiority of our algorithm compared to L1-Graph and other competing clustering methods.
A Data Complexity Approach to Kernel Selection for Support Vector Machines
Valerio, Roberto (University of Houston) | Vilalta, Ricardo (University of Houston)
We describe a data complexity approach to kernel selection based on the behavior of polynomial and Gaussian kernels. Our resultsshow how the use of a Gaussian kernel produces a gram matrix with useful local information that has no equivalent counterpart inpolynomial kernels.By exploiting neighborhood information embedded by data complexity measures, we are able to carry out a form of meta-generalization.Our goal is to predict which data sets are more favorable to particular kernels (Gaussian or polynomial).The end result is a framework to improve the model selection process in Support Vector Machines.
Locality Preserving Hashing
Zhao, Kang (Shanghai Jiao Tong University) | Lu, Hongtao (Shanghai Jiao Tong University) | Mei, Jincheng (Shanghai Jiao Tong University)
Hashing has recently attracted considerable attention for large scale similarity search. However, learning compact codes with good performance is still a challenge. In many cases, the real-world data lies on a low-dimensional manifold embedded in high-dimensional ambient space. To capture meaningful neighbors, a compact hashing representation should be able to uncover the intrinsic geometric structure of the manifold, e.g., the neighborhood relationships between subregions. Most existing hashing methods only consider this issue during mapping data points into certain projected dimensions. When getting the binary codes, they either directly quantize the projected values with a threshold, or use an orthogonal matrix to refine the initial projection matrix, which both consider projection and quantization separately, and will not well preserve the locality structure in the whole learning process. In this paper, we propose a novel hashing algorithm called Locality Preserving Hashing to effectively solve the above problems. Specifically, we learn a set of locality preserving projections with a joint optimization framework, which minimizes the average projection distance and quantization loss simultaneously. Experimental comparisons with other state-of-the-art methods on two large scale datasets demonstrate the effectiveness and efficiency of our method.