Pei, Jian
TIMERS: Error-Bounded SVD Restart on Dynamic Networks
Zhang, Ziwei (Tsinghua University) | Cui, Peng (Tsinghua University) | Pei, Jian (Simon Fraser University) | Wang, Xiao (Tsinghua University) | Zhu, Wenwu (Tsinghua University)
Singular Value Decomposition (SVD) is a popular approach in various network applications, such as link prediction and network parameter characterization. Incremental SVD approaches are proposed to process newly changed nodes and edges in dynamic networks. However, incremental SVD approaches suffer from serious error accumulation inevitably due to approximation on incremental updates. SVD restart is an effective approach to reset the aggregated error, but when to restart SVD for dynamic networks is not addressed in literature. In this paper, we propose TIMERS, Theoretically Instructed Maximum-Error-bounded Restart of SVD, a novel approach which optimally sets the restart time in order to reduce error accumulation in time. Specifically, we monitor the margin between reconstruction loss of incremental updates and the minimum loss in SVD model. To reduce the complexity of monitoring, we theoretically develop a lower bound of SVD minimum loss for dynamic networks and use the bound to replace the minimum loss in monitoring. By setting a maximum tolerated error as a threshold, we can trigger SVD restart automatically when the margin exceeds this threshold. We prove that the time complexity of our method is linear with respect to the number of local dynamic changes, and our method is general across different types of dynamic networks. We conduct extensive experiments on several synthetic and real dynamic networks. The experimental results demonstrate that our proposed method significantly outperforms the existing methods by reducing 27% to 42% in terms of the maximum error for dynamic network reconstruction when fixing the number of restarts. Our method reduces the number of restarts by 25% to 50% when fixing the maximum error tolerated.
Community Preserving Network Embedding
Wang, Xiao (Tsinghua University) | Cui, Peng (Tsinghua University) | Wang, Jing (Bournemouth University) | Pei, Jian (Simon Fraser University) | Zhu, Wenwu (Tsinghua University) | Yang, Shiqiang (Tsinghua University)
Network embedding, aiming to learn the low-dimensional representations of nodes in networks, is of paramount importance in many real applications. One basic requirement of network embedding is to preserve the structure and inherent properties of the networks. While previous network embedding methods primarily preserve the microscopic structure, such as the first- and second-order proximities of nodes, the mesoscopic community structure, which is one of the most prominent feature of networks, is largely ignored. In this paper, we propose a novel Modularized Nonnegative Matrix Factorization (M-NMF) model to incorporate the community structure into network embedding. We exploit the consensus relationship between the representations of nodes and community structure, and then jointly optimize NMF based representation learning model and modularity based community detection model in a unified framework, which enables the learned representations of nodes to preserve both of the microscopic and community structures. We also provide efficient updating rules to infer the parameters of our model, together with the correctness and convergence guarantees. Extensive experimental results on a variety of real-world networks show the superior performance of the proposed method over the state-of-the-arts.
Towards Cohesive Anomaly Mining
Xiong, Yun (Fudan University) | Zhu, Yangyong (Fudan University) | Yu, Philip S. (University of Illinois at Chicago) | Pei, Jian (Simon Fraser University)
In some applications, such as bioinformatics, social network analysis, and computational criminology, it is desirable to find compact clusters formed by a (very) small portion of objects in a large data set. Since such clusters are comprised of a small number of objects, they are extraordinary and anomalous with respect to the entire data set. This specific type of clustering task cannot be solved well by the conventional clustering methods since generally those methods try to assign most of the data objects into clusters. In this paper, we model this novel and application-inspired task as the problem of mining cohesive anomalies. We propose a general framework and a principled approach to tackle the problem. The experimental results on both synthetic and real data sets verify the effectiveness and efficiency of our approach.
Early Prediction on Time Series: A Nearest Neighbor Approach
Xing, Zhengzheng (Simon Fraser Univeristy) | Pei, Jian (Simon Fraser University) | Yu, Philip S. (University of Illinois at Chicago)
In this paper, we formulate the problem of early classification of time series data, which is important in some time-sensitive applications such as health-informatics. We introduce a novel concept of MPL (Minimum Prediction Length) and develop ECTS (Early Classification on Time Series), an effective 1-nearest neighbor classification method. ECTS makes early predictions and at the same time retains the accuracy comparable to that of a 1NN classifier using the full-length time series. Our empirical study using benchmark time series data sets shows that ECTS works well on the real data sets where 1NN classification is effective.