Cui, Peng
Causally Regularized Learning with Agnostic Data Selection Bias
Shen, Zheyan, Cui, Peng, Kuang, Kun, Li, Bo, Chen, Peixuan
Most of previous machine learning algorithms are proposed based on the i.i.d. hypothesis. However, this ideal assumption is often violated in real applications, where selection bias may arise between training and testing process. Moreover, in many scenarios, the testing data is not even available during the training process, which makes the traditional methods like transfer learning infeasible due to their need on prior of test distribution. Therefore, how to address the agnostic selection bias for robust model learning is of paramount importance for both academic research and real applications. In this paper, under the assumption that causal relationships among variables are robust across domains, we incorporate causal technique into predictive modeling and propose a novel Causally Regularized Logistic Regression (CRLR) algorithm by jointly optimize global confounder balancing and weighted logistic regression. Global confounder balancing helps to identify causal features, whose causal effect on outcome are stable across domains, then performing logistic regression on those causal features constructs a robust predictive model against the agnostic bias. To validate the effectiveness of our CRLR algorithm, we conduct comprehensive experiments on both synthetic and real world datasets. Experimental results clearly demonstrate that our CRLR algorithm outperforms the state-of-the-art methods, and the interpretability of our method can be fully depicted by the feature visualization.
Stable Prediction across Unknown Environments
Kuang, Kun, Xiong, Ruoxuan, Cui, Peng, Athey, Susan, Li, Bo
In many important machine learning applications, the training distribution used to learn a probabilistic classifier differs from the testing distribution on which the classifier will be used to make predictions. Traditional methods correct the distribution shift by reweighting the training data with the ratio of the density between test and training data. In many applications training takes place without prior knowledge of the testing distribution on which the algorithm will be applied in the future. Recently, methods have been proposed to address the shift by learning causal structure, but those methods rely on the diversity of multiple training data to a good performance, and have complexity limitations in high dimensions. In this paper, we propose a novel Deep Global Balancing Regression (DGBR) algorithm to jointly optimize a deep auto-encoder model for feature selection and a global balancing model for stable prediction across unknown environments. The global balancing model constructs balancing weights that facilitate estimating of partial effects of features (holding fixed all other features), a problem that is challenging in high dimensions, and thus helps to identify stable, causal relationships between features and outcomes. The deep auto-encoder model is designed to reduce the dimensionality of the feature space, thus making global balancing easier. We show, both theoretically and with empirical experiments, that our algorithm can make stable predictions across unknown environments. Our experiments on both synthetic and real world datasets demonstrate that our DGBR algorithm outperforms the state-of-the-art methods for stable prediction across unknown environments.
Billion-scale Network Embedding with Iterative Random Projection
Zhang, Ziwei, Cui, Peng, Li, Haoyang, Wang, Xiao, Zhu, Wenwu
Network embedding has attracted considerable research attention recently. However, the existing methods are incapable of handling billion-scale networks, because they are computationally expensive and, at the same time, difficult to be accelerated by distributed computing schemes. To address these problems, we propose RandNE, a novel and simple billion-scale network embedding method. Specifically, we propose a Gaussian random projection approach to map the network into a low-dimensional embedding space while preserving the high-order proximities between nodes. To reduce the time complexity, we design an iterative projection procedure to avoid the explicit calculation of the high-order proximities. Theoretical analysis shows that our method is extremely efficient, and friendly to distributed computing schemes without any communication cost in the calculation. We demonstrate the efficacy of RandNE over state-of-the-art methods in network reconstruction and link prediction tasks on multiple datasets with different scales, ranging from thousands to billions of nodes and edges.
TIMERS: Error-Bounded SVD Restart on Dynamic Networks
Zhang, Ziwei (Tsinghua University) | Cui, Peng (Tsinghua University) | Pei, Jian (Simon Fraser University) | Wang, Xiao (Tsinghua University) | Zhu, Wenwu (Tsinghua University)
Singular Value Decomposition (SVD) is a popular approach in various network applications, such as link prediction and network parameter characterization. Incremental SVD approaches are proposed to process newly changed nodes and edges in dynamic networks. However, incremental SVD approaches suffer from serious error accumulation inevitably due to approximation on incremental updates. SVD restart is an effective approach to reset the aggregated error, but when to restart SVD for dynamic networks is not addressed in literature. In this paper, we propose TIMERS, Theoretically Instructed Maximum-Error-bounded Restart of SVD, a novel approach which optimally sets the restart time in order to reduce error accumulation in time. Specifically, we monitor the margin between reconstruction loss of incremental updates and the minimum loss in SVD model. To reduce the complexity of monitoring, we theoretically develop a lower bound of SVD minimum loss for dynamic networks and use the bound to replace the minimum loss in monitoring. By setting a maximum tolerated error as a threshold, we can trigger SVD restart automatically when the margin exceeds this threshold. We prove that the time complexity of our method is linear with respect to the number of local dynamic changes, and our method is general across different types of dynamic networks. We conduct extensive experiments on several synthetic and real dynamic networks. The experimental results demonstrate that our proposed method significantly outperforms the existing methods by reducing 27% to 42% in terms of the maximum error for dynamic network reconstruction when fixing the number of restarts. Our method reduces the number of restarts by 25% to 50% when fixing the maximum error tolerated.
Deep Asymmetric Transfer Network for Unbalanced Domain Adaptation
Wang, Daixin (Tsinghua University) | Cui, Peng (Tsinghua University ) | Zhu, Wenwu (Tsinghua University )
Recently, domain adaptation based on deep models has been a promising way to deal with the domains with scarce labeled data, which is a critical problem for deep learning models. Domain adaptation propagates the knowledge from a source domain with rich information to the target domain. In reality, the source and target domains are mostly unbalanced in that the source domain is more resource-rich and thus has more reliable knowledge than the target domain. However, existing deep domain adaptation approaches often pre-assume the source and target domains balanced and equally, leading to a medium solution between the source and target domains, which is not optimal for the unbalanced domain adaptation. In this paper, we propose a novel Deep Asymmetric Transfer Network (DATN) to address the problem of unbalanced domain adaptation. Specifically, our model will learn a transfer function from the target domain to the source domain and meanwhile adapting the source domain classifier with more discriminative power to the target domain. By doing this, the deep model is able to adaptively put more emphasis on the resource-rich source domain. To alleviate the scarcity problem of supervised data, we further propose an unsupervised transfer method to propagate the knowledge from a lot of unsupervised data by minimizing the distribution discrepancy over the unlabeled data of two domains. The experiments on two real-world datasets demonstrate that DATN attains a substantial gain over state-of-the-art methods.
Structural Deep Embedding for Hyper-Networks
Tu, Ke (Tsinghua University) | Cui, Peng (Tsinghua University) | Wang, Xiao (Tsinghua University) | Wang, Fei (Cornell University) | Zhu, Wenwu (Tsinghua University)
Network embedding has recently attracted lots of attentions in data mining. Existing network embedding methods mainly focus on networks with pairwise relationships. In real world, however, the relationships among data points could go beyond pairwise, i.e., three or more objects are involved in each relationship represented by a hyperedge, thus forming hyper-networks. These hyper-networks pose great challenges to existing network embedding methods when the hyperedges are indecomposable, that is to say, any subset of nodes in a hyperedge cannot form another hyperedge. These indecomposable hyperedges are especially common in heterogeneous networks. In this paper, we propose a novel Deep Hyper-Network Embedding (DHNE) model to embed hyper-networks with indecomposable hyperedges. More specifically, we theoretically prove that any linear similarity metric in embedding space commonly used in existing methods cannot maintain the indecomposibility property in hyper-networks, and thus propose a new deep model to realize a non-linear tuplewise similarity function while preserving both local and global proximities in the formed embedding space. We conduct extensive experiments on four different types of hyper-networks, including a GPS network, an online social network, a drug network and a semantic network. The empirical results demonstrate that our method can significantly and consistently outperform the state-of-the-art algorithms.
DepthLGP: Learning Embeddings of Out-of-Sample Nodes in Dynamic Networks
Ma, Jianxin (Tsinghua University) | Cui, Peng (Tsinghua University) | Zhu, Wenwu (Tsinghua University)
Network embedding algorithms to date are primarily designed for static networks, where all nodes are known before learning. How to infer embeddings for out-of-sample nodes, i.e. nodes that arrive after learning, remains an open problem. The problem poses great challenges to existing methods, since the inferred embeddings should preserve intricate network properties such as high-order proximity, share similar characteristics (i.e. be of a homogeneous space) with in-sample node embeddings, and be of low computational cost. To overcome these challenges, we propose a Deeply Transformed High-order Laplacian Gaussian Process (DepthLGP) method to infer embeddings for out-of-sample nodes. DepthLGP combines the strength of nonparametric probabilistic modeling and deep learning. In particular, we design a high-order Laplacian Gaussian process (hLGP) to encode network properties, which permits fast and scalable inference. In order to further ensure homogeneity, we then employ a deep neural network to learn a nonlinear transformation from latent states of the hLGP to node embeddings. DepthLGP is general, in that it is applicable to embeddings learned by any network embedding algorithms. We theoretically prove the expressive power of DepthLGP, and conduct extensive experiments on real-world networks. Empirical results demonstrate that our approach can achieve significant performance gain over existing approaches.
Community Preserving Network Embedding
Wang, Xiao (Tsinghua University) | Cui, Peng (Tsinghua University) | Wang, Jing (Bournemouth University) | Pei, Jian (Simon Fraser University) | Zhu, Wenwu (Tsinghua University) | Yang, Shiqiang (Tsinghua University)
Network embedding, aiming to learn the low-dimensional representations of nodes in networks, is of paramount importance in many real applications. One basic requirement of network embedding is to preserve the structure and inherent properties of the networks. While previous network embedding methods primarily preserve the microscopic structure, such as the first- and second-order proximities of nodes, the mesoscopic community structure, which is one of the most prominent feature of networks, is largely ignored. In this paper, we propose a novel Modularized Nonnegative Matrix Factorization (M-NMF) model to incorporate the community structure into network embedding. We exploit the consensus relationship between the representations of nodes and community structure, and then jointly optimize NMF based representation learning model and modularity based community detection model in a unified framework, which enables the learned representations of nodes to preserve both of the microscopic and community structures. We also provide efficient updating rules to infer the parameters of our model, together with the correctness and convergence guarantees. Extensive experimental results on a variety of real-world networks show the superior performance of the proposed method over the state-of-the-arts.
Treatment Effect Estimation with Data-Driven Variable Decomposition
Kuang, Kun (Tainghua University) | Cui, Peng ( Tsinghua University ) | Li, Bo ( Tsinghua University ) | Jiang, Meng ( University of Illinois Urbana-Champaign ) | Yang, Shiqiang (Tsinghua University) | Wang, Fei ( Cornell University )
One fundamental problem in causal inference is the treatment effect estimation in observational studies when variables are confounded. Control for confounding effect is generally handled by propensity score. But it treats all observed variables as confounders and ignores the adjustment variables, which have no influence on treatment but are predictive of the outcome. Recently, it has been demonstrated that the adjustment variables are effective in reducing the variance of the estimated treatment effect. However, how to automatically separate the confounders and adjustment variables in observational studies is still an open problem, especially in the scenarios of high dimensional variables, which are common in big data era. In this paper, we propose a Data-Driven Variable Decomposition (D$^2$VD) algorithm, which can 1) automatically separate confounders and adjustment variables with a data driven approach, and 2) simultaneously estimate treatment effect in observational studies with high dimensional variables. Under standard assumptions, we show experimentally that the proposed D$^2$VD algorithm can automatically separate the variables precisely, and estimate treatment effect more accurately and with tighter confidence intervals than the state-of-the-art methods on both synthetic data and real online advertising dataset.
Little Is Much: Bridging Cross-Platform Behaviors through Overlapped Crowds
Jiang, Meng (Tsinghua University) | Cui, Peng (Tsinghua University) | Yuan, Nicholas Jing (Microsoft Research Asia) | Xie, Xing (Microsoft Research Asia) | Yang, Shiqiang (Tsinghua University)
People often use multiple platforms to fulfill their different information needs. With the ultimate goal of serving people intelligently, a fundamental way is to get comprehensive understanding about user needs. How to organically integrate and bridge cross-platform information in a human-centric way is important. Existing transfer learning assumes either fully-overlapped or non-overlapped among the users. However, the real case is the users of different platforms are partially overlapped. The number of overlapped users is often small and the explicitly known overlapped users is even less due to the lacking of unified ID for a user across different platforms. In this paper, we propose a novel semi-supervised transfer learning method to address the problem of cross-platform behavior prediction, called XPTrans. To alleviate the sparsity issue, it fully exploits the small number of overlapped crowds to optimally bridge a user's behaviors in different platforms. Extensive experiments across two real social networks show that XPTrans significantly outperforms the state-of-the-art. We demonstrate that by fully exploiting 26% overlapped users, XPTrans can predict the behaviors of non-overlapped users with the same accuracy as overlapped users, which means the small overlapped crowds can successfully bridge the information across different platforms.