Statistical Learning
Embedded Unsupervised Feature Selection
Wang, Suhang (Arizona State University) | Tang, Jiliang (Arizona State University) | Liu, Huan (Arizona State University)
Sparse learning has been proven to be a powerful techniquein supervised feature selection, which allows toembed feature selection into the classification (or regression)problem. In recent years, increasing attentionhas been on applying spare learning in unsupervisedfeature selection. Due to the lack of label information,the vast majority of these algorithms usually generatecluster labels via clustering algorithms and then formulateunsupervised feature selection as sparse learningbased supervised feature selection with these generatedcluster labels. In this paper, we propose a novel unsupervisedfeature selection algorithm EUFS, which directlyembeds feature selection into a clustering algorithm viasparse learning without the transformation. The AlternatingDirection Method of Multipliers is used to addressthe optimization problem of EUFS. Experimentalresults on various benchmark datasets demonstrate theeffectiveness of the proposed framework EUFS.
A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data
Ghassemi, Marzyeh (Massachusetts Institute of Technology) | Pimentel, Marco A.F. (University of Oxford) | Naumann, Tristan (Massachusetts Institute of Technology) | Brennan, Thomas (Massachusetts Institute of Technology) | Clifton, David A. (University of Oxford) | Szolovits, Peter (Massachusetts Institute of Technology) | Feng, Mengling (Massachusetts Institute of Technology)
The ability to determine patient acuity (or severity of illness) has immediate practical use for clinicians. We evaluate the use of multivariate timeseries modeling with the multi-task Gaussian process (GP) models using noisy, incomplete, sparse, heterogeneous and unevenly-sampled clinical data, including both physiological signals and clinical notes. The learned multi-task GP (MTGP) hyperparameters are then used to assess and forecast patient acuity. Experiments were conducted with two real clinical data sets acquired from ICU patients: firstly, estimating cerebrovascular pressure reactivity, an important indicator of secondary damage for traumatic brain injury patients, by learning the interactions between intracranial pressure and mean arterial blood pressure signals, and secondly, mortality prediction using clinical progress notes. In both cases, MTGPs provided improved results: an MTGP model provided better results than single-task GP models for signal interpolation and forecasting (0.91 vs 0.69 RMSE), and the use of MTGP hyperparameters obtained improved results when used as additional classification features (0.812 vs 0.788 AUC).
Personalized Tag Recommendation through Nonlinear Tensor Factorization Using Gaussian Kernel
Fang, Xiaomin (Sun Yat-sen University) | Pan, Rong (Sun Yat-sen University) | Cao, Guoxiang (Huawei Technologies Co. Ltd) | He, Xiuqiang (Huawei Technologies Co. Ltd) | Dai, Wenyuan (Huawei Technologies Co. Ltd)
Personalized tag recommendation systems recommend a list of tags to a user when he is about to annotate an item. It exploits the individual preference and the characteristic of the items. Tensor factorization tech- niques have been applied to many applications, such as tag recommendation. Models based on Tucker Decomposition can achieve good performance but require a lot of computation power. On the other hand, mod- els based on Canonical Decomposition can run in linear time and are more feasible for online recommendation. In this paper, we propose a novel method for personalized tag recommendation, which can be considered as a nonlinear extension of Canonical Decomposition. Different from linear tensor factorization, we exploit Gaussian radial basis function to increase the modelโs capacity. The experimental results show that our proposed method outperforms the state-of-the-art methods for tag recommendation on real datasets and perform well even with a small number of features, which verifies that our models can make better use of features.
Person Identification Using Anthropometric and Gait Data from Kinect Sensor
Andersson, Virginia Ortiz (Federal University of Pelotas) | Araujo, Ricardo Matsumura (Federal University of Pelotas)
Uniquely identifying individuals using anthropometric and gait data allows for passive biometric systems, where cooperation from the subjects being identified is not required. In this paper, we report on experiments using a novel data set composed of 140 individuals walking in front of a Microsoft Kinect sensor. We provide a methodology to extract anthropometric and gait features from this data and show results of applying different machine learning algorithms on subject identification tasks. Focusing on KNN classifiers, we discuss how accuracy varies in different settings, including number of individuals in a gallery, types of attributes used and number of considered neighbors. Finally, we compare the obtained results with other results in the literature, showing that our approach has comparable accuracy for large galleries.
Representation Learning for Aspect Category Detection in Online Reviews
Zhou, Xinjie (Peking University) | Wan, Xiaojun (Peking University) | Xiao, Jianguo (Peking University)
User-generated reviews are valuable resources for decision making. Identifying the aspect categories discussed in a given review sentence (e.g., โfoodโ and โserviceโ in restaurant reviews) is an important task of sentiment analysis and opinion mining. Given a predefined aspect category set, most previous researches leverage hand-crafted features and a classification algorithm to accomplish the task. The crucial step to achieve better performance is feature engineering which consumes much human effort and may be unstable when the product domain changes. In this paper, we propose a representation learning approach to automatically learn useful features for aspect category detection. Specifically, a semi-supervised word embedding algorithm is first proposed to obtain continuous word representations on a large set of reviews with noisy labels. Afterwards, we propose to generate deeper and hybrid features through neural networks stacked on the word vectors. A logistic regression classifier is finally trained with the hybrid features to predict the aspect category. The experiments are carried out on a benchmark dataset released by SemEval-2014. Our approach achieves the state-of-the-art performance and outperforms the best participating team as well as a few strong baselines.
Retweet Behavior Prediction Using Hierarchical Dirichlet Process
Zhang, Qi (Fudan University) | Gong, Yeyun (Fudan University) | Guo, Ya (Fudan University) | Huang, Xuanjing (Fudan University)
The task of predicting retweet behavior is an important and essential step for various social network applications, such as business intelligence, popular event prediction, and so on. Due to the increasing requirements, in recent years, the task has attracted extensive attentions. In this work, we propose a novel method using non-parametric statistical models to combine structural, textual, and temporal information together to predict retweet behavior. To evaluate the proposed method, we collect a large number of microblogs and their corresponding social networks from a real microblog service. Experimental results on the constructed dataset demonstrate that the proposed method can achieve better performance than state-of-the-art methods. The relative improvement of the the proposed over the method using only textual information is more than 38.5% in terms of F1-Score.
Incorporating Implicit Link Preference Into Overlapping Community Detection
Zhang, Hongyi (The Chinese University of Hong Kong) | King, Irwin (The Chinese University of Hong Kong) | Lyu, Michael R. (The Chinese University of Hong Kong)
Community detection is an important technique to understand structures and patterns in complex networks. Recently, overlapping community detection becomes a trend due to the ubiquity of overlapping and nested communities in real world. However, existing approaches have ignored the use of implicit link preference information, i.e., links can reflect a node's preference on the targets of connections it wants to build. This information has strong impact on community detection since a node prefers to build links with nodes inside its community than those outside its community. In this paper, we propose a preference-based nonnegative matrix factorization (PNMF) model to incorporate implicit link preference information. Unlike conventional matrix factorization approaches, which simply approximate the original adjacency matrix in value, our model maximizes the likelihood of the preference order for each node by following the intuition that a node prefers its neighbors than other nodes. Our model overcomes the indiscriminate penalty problem in which non-linked pairs inside one community are equally penalized in objective functions as those across two communities. We propose a learning algorithm which can learn a node-community membership matrix via stochastic gradient descent with bootstrap sampling. We evaluate our PNMF model on several real-world networks. Experimental results show that our model outperforms state-of-the-art approaches and can be applied to large datasets.
On the Scalable Learning of Stochastic Blockmodel
Yang, Bo (Jilin University) | Zhao, Xuehua (Jilin University)
Stochastic blockmodel (SBM) enables us to decompose and analyze an exploratory network without a priori knowledge about its intrinsic structure. However, the task of effectively and efficiently learning a SBM from a large-scale network is still challenging due to the high computational cost of its model selection and parameter estimation. To address this issue, we present a novel SBM learning algorithm referred to as BLOS (BLOckwise Sbm learning). Distinct from the literature, the model selection and parameter estimation of SBM are concurrently, rather than alternately, executed in BLOS by embedding the minimum message length criterion into a block-wise EM algorithm, which greatly reduces the time complexity of SBM learning without losing learning accuracy and modeling flexibility. Its effectiveness and efficiency have been tested through rigorous comparisons with the state-of-the-art methods on both synthetic and real-world networks.
Mining Query Subtopics from Questions in Community Question Answering
Wu, Yu (Beihang University) | Wu, Wei (Microsoft Reasearch Asia) | Li, Zhoujun (Beihang University) | Zhou, Ming (Microsoft Reasearch Asia)
This paper proposes mining query subtopics from questions in community question answering (CQA). The subtopics are represented as a number of clusters of questions with keywords summarizing the clusters. The task is unique in that the subtopics from questions can not only facilitate user browsing in CQA search, but also describe aspects of queries from a question-answering perspective. The challenges of the task include how to group semantically similar questions and how to find keywords capable of summarizing the clusters. We formulate the subtopic mining task as a non-negative matrix factorization (NMF) problem and further extend the model of NMF to incorporate question similarity estimated from metadata of CQA into learning. Compared with existing methods, our method can jointly optimize question clustering and keyword extraction and encourage the former task to enhance the latter. Experimental results on large scale real world CQA datasets show that the proposed method significantly outperforms the existing methods in terms of keyword extraction, while achieving a comparable performance to the state-of-the-art methods for question clustering.
Causal Inference via Sparse Additive Models with Application to Online Advertising
Sun, Wei (Purdue University) | Wang, Pengyuan (Yahoo! Labs) | Yin, Dawei (Yahoo! Labs) | Yang, Jian (Yahoo! Labs) | Chang, Yi (Yahoo! Labs)
Advertising effectiveness measurement is a fundamental problem in online advertising. Various causal inference methods have been employed to measure the causal effects of ad treatments. However, existing methods mainly focus on linear logistic regression for univariate and binary treatments and are not well suited for complex ad treatments of multi-dimensions, where each dimension could be discrete or continuous. In this paper we propose a novel two-stage causal inference framework for assessing the impact of complex ad treatments. In the first stage, we estimate the propensity parameter via a sparse additive model; in the second stage, a propensity-adjusted regression model is applied for measuring the treatment effect. Our approach is shown to provide an unbiased estimation of the ad effectiveness under regularity conditions. To demonstrate the efficacy of our approach, we apply it to a real online advertising campaign to evaluate the impact of three ad treatments: ad frequency, ad channel, and ad size. We show that the ad frequency usually has a treatment effect cap when ads are showing on mobile device. In addition, the strategies for choosing best ad size are completely different for mobile ads and online ads.