Statistical Learning
Instance-Wise Weighted Nonnegative Matrix Factorization for Aggregating Partitions with Locally Reliable Clusters
Zheng, Xiaodong (Fudan University) | Zhu, Shanfeng (Fudan University) | Gao, Junning (Fudan University) | Mamitsuka, Hiroshi (Kyoto University)
We address an ensemble clustering problem, where reliable clusters are locally embedded in given multiple partitions. We propose a new nonnegative matrix factorization (NMF)-based method, in which locally reliable clusters are explicitly considered by using instance-wise weights over clusters. Our method factorizes the input cluster assignment matrix into two matrices H and W, which are optimized by iteratively 1) updating H and W while keeping the weight matrix constant and 2) updating the weight matrix while keeping H and W constant, alternatively. The weights in the second step were updated by solving a convex problem, which makes our algorithm significantly faster than existing NMF-based ensemble clustering methods. We empirically proved that our method outperformed a lot of cutting-edge ensemble clustering methods by using a variety of datasets.
Instance-Wise Weighted Nonnegative Matrix Factorization for Aggregating Partitions with Locally Reliable Clusters
Zheng, Xiaodong (Fudan University) | Zhu, Shanfeng (Fudan University) | Gao, Junning (Fudan University) | Mamitsuka, Hiroshi (Kyoto University)
We address an ensemble clustering problem, where reliable clusters are locally embedded in given multiple partitions. We propose a new nonnegative matrix factorization (NMF)-based method, in which locally reliable clusters are explicitly considered by using instance-wise weights over clusters. Our method factorizes the input cluster assignment matrix into two matrices H and W, which are optimized by iteratively 1) updating H and W while keeping the weight matrix constant and 2) updating the weight matrix while keeping H and W constant, alternatively. The weights in the second step were updated by solving a convex problem, which makes our algorithm significantly faster than existing NMF-based ensemble clustering methods. We empirically proved that our method outperformed a lot of cutting-edge ensemble clustering methods by using a variety of datasets.
Adaptive Dropout Rates for Learning with Corrupted Features
Zhuo, Jingwei (Tsinghua University) | Zhu, Jun (Tsinghua University) | Zhang, Bo (Tsinghua University)
Feature noising is an effective mechanism on reducing the risk of overfitting. To avoid an explosive searching space, existing work typically assumes that all features share a single noise level, which is often cross-validated. In this paper, we present a Bayesian feature noising model that flexibly allows for dimension-specific or group-specific noise levels, and we derive a learning algorithm that adaptively updates these noise levels. Our adaptive rule is simple and interpretable, by drawing a direct connection to the fitness of each individual feature or feature group. Empirical results on various datasets demonstrate the effectiveness on avoiding extensive tuning and sometimes improving the performance due to its flexibility.
A Hybrid Neural Model for Type Classification of Entity Mentions
Dong, Li (Beihang University) | Wei, Furu (Microsoft Research) | Sun, Hong (Microsoft Corporation) | Zhou, Ming (Microsoft Research) | Xu, Ke (Beihang University)
The semantic class (i.e., type) of an entity plays a vital role in many natural language processing tasks, such as question answering. However, most of existing type classification systems extensively rely on hand-crafted features. This paper introduces a hybrid neural model which classifies entity mentions to a wide-coverage set of 22 types derived from DBpedia. It consists of two parts. The mention model uses recurrent neural networks to recursively obtain the vector representation of an entity mention from the words it contains. The context model, on the other hand, employs multilayer perceptrons to obtain the hidden representation for contextual information of a mention. Representations obtained by the two parts are used together to predict the type distribution. Using automatically generated data, these two parts are jointly learned. Experimental studies illustrate that the proposed approach outperforms baseline methods. Moreover, when type information provided by our method is used in a question answering system, we observe a 14.7% relative improvement for the top-1 accuracy of answers.
Supervised Representation Learning: Transfer Learning with Deep Autoencoders
Zhuang, Fuzhen (Chinese Academy of Sciences) | Cheng, Xiaohu (Chinese Academy of Sciences) | Luo, Ping (Chinese Academy of Sciences) | Pan, Sinno Jialin (Nanyang Technological University) | He, Qing (Chinese Academy of Sciences)
Transfer learning has attracted a lot of attention in the past decade. One crucial research issue in transfer learning is how to find a good representation for instances of different domains such that the divergence between domains can be reduced with the new representation. Recently, deep learning has been proposed to learn more robust or higher-level features for transfer learning. However, to the best of our knowledge, most of the previous approaches neither minimize the difference between domains explicitly nor encode label information in learning the representation. In this paper, we propose a supervised representation learning method based on deep autoencoders for transfer learning. The proposed deep autoencoder consists of two encoding layers: an embedding layer and a label encoding layer. In the embedding layer, the distance in distributions of the embedded instances between the source and target domains is minimized in terms of KL-Divergence. In the label encoding layer, label information of the source domain is encoded using a softmax regression model. Extensive experiments conducted on three real-world image datasets demonstrate the effectiveness of our proposed method compared with several state-of-the-art baseline methods.
Dual-Regularized Multi-View Outlier Detection
Zhao, Handong (Northeastern University) | Fu, Yun (Northeastern University)
Multi-view outlier detection is a challenging problem due to the inconsistent behaviors and complicated distributions of samples across different views. The existing approaches are designed to identify the outlier exhibiting inconsistent characteristics across different views. However, due to the inevitable system errors caused by data-captured sensors or others, there always exists another type of outlier, which consistently behaves abnormally in individual view. Unfortunately, this kind of outlier is neglected by all the existing multi-view outlier detection methods, consequently their outlier detection performances are dramatically harmed.In this paper, we propose a novel Dual-regularized Multi-view Outlier Detection method (DMOD) to detect both kinds of anomalies simultaneously. By representing the multi-view data with latent coefficients and sample-specific errors, we characterize each kind of outlier explicitly. Moreover, an outlier measurement criterion is well-designed to quantify the inconsistency. To solve the proposed non-smooth model, a novel optimization algorithm is proposed in an iterative manner. We evaluate our method on five datasets with different outlier settings. The consistent superior results to other state-of-the-art methods demonstrate the effectiveness of our approach.
Instance-Wise Weighted Nonnegative Matrix Factorization for Aggregating Partitions with Locally Reliable Clusters
Zheng, Xiaodong (Fudan University) | Zhu, Shanfeng (Fudan University) | Gao, Junning (Fudan University) | Mamitsuka, Hiroshi (Kyoto University)
We address an ensemble clustering problem, where reliable clusters are locally embedded in given multiple partitions. We propose a new nonnegative matrix factorization (NMF)-based method, in which locally reliable clusters are explicitly considered by using instance-wise weights over clusters. Our method factorizes the input cluster assignment matrix into two matrices H and W, which are optimized by iteratively 1) updating H and W while keeping the weight matrix constant and 2) updating the weight matrix while keeping H and W constant, alternatively. The weights in the second step were updated by solving a convex problem, which makes our algorithm significantly faster than existing NMF-based ensemble clustering methods. We empirically proved that our method outperformed a lot of cutting-edge ensemble clustering methods by using a variety of datasets.
Personalized Ranking Metric Embedding for Next New POI Recommendation
Feng, Shanshan (Nanyang Technological University) | Li, Xutao (Nanyang Technological University) | Zeng, Yifeng (Teesside University) | Cong, Gao (Nanyang Technological University) | Chee, Yeow Meng (Nanyang Technological University) | Yuan, Quan (Nanyang Technological University)
The rapidly growing of Location-based Social Networks (LBSNs) provides a vast amount of check-in data, which enables many services, e.g., point-of-interest (POI) recommendation. In this paper, we study the next new POI recommendation problem in which new POIs with respect to users' current location are to be recommended. The challenge lies in the difficulty in precisely learning users' sequential information and personalizing the recommendation model. To this end, we resort to the Metric Embedding method for the recommendation, which avoids drawbacks of the Matrix Factorization technique. We propose a personalized ranking metric embedding method (PRME) to model personalized check-in sequences. We further develop a PRME-G model, which integrates sequential information, individual preference, and geographical influence, to improve the recommendation performance. Experiments on two real-world LBSN datasets demonstrate that our new algorithm outperforms the state-of-the-art next POI recommendation methods.
Instance-Wise Weighted Nonnegative Matrix Factorization for Aggregating Partitions with Locally Reliable Clusters
Zheng, Xiaodong (Fudan University) | Zhu, Shanfeng (Fudan University) | Gao, Junning (Fudan University) | Mamitsuka, Hiroshi (Kyoto University)
We address an ensemble clustering problem, where reliable clusters are locally embedded in given multiple partitions. We propose a new nonnegative matrix factorization (NMF)-based method, in which locally reliable clusters are explicitly considered by using instance-wise weights over clusters. Our method factorizes the input cluster assignment matrix into two matrices H and W, which are optimized by iteratively 1) updating H and W while keeping the weight matrix constant and 2) updating the weight matrix while keeping H and W constant, alternatively. The weights in the second step were updated by solving a convex problem, which makes our algorithm significantly faster than existing NMF-based ensemble clustering methods. We empirically proved that our method outperformed a lot of cutting-edge ensemble clustering methods by using a variety of datasets.
Instance-Wise Weighted Nonnegative Matrix Factorization for Aggregating Partitions with Locally Reliable Clusters
Zheng, Xiaodong (Fudan University) | Zhu, Shanfeng (Fudan University) | Gao, Junning (Fudan University) | Mamitsuka, Hiroshi (Kyoto University)
We address an ensemble clustering problem, where reliable clusters are locally embedded in given multiple partitions. We propose a new nonnegative matrix factorization (NMF)-based method, in which locally reliable clusters are explicitly considered by using instance-wise weights over clusters. Our method factorizes the input cluster assignment matrix into two matrices H and W, which are optimized by iteratively 1) updating H and W while keeping the weight matrix constant and 2) updating the weight matrix while keeping H and W constant, alternatively. The weights in the second step were updated by solving a convex problem, which makes our algorithm significantly faster than existing NMF-based ensemble clustering methods. We empirically proved that our method outperformed a lot of cutting-edge ensemble clustering methods by using a variety of datasets.