Fan, Yujie
Revealing the Power of Spatial-Temporal Masked Autoencoders in Multivariate Time Series Forecasting
Sun, Jiarui, Fan, Yujie, Yeh, Chin-Chia Michael, Zhang, Wei, Chowdhary, Girish
Multivariate time series (MTS) forecasting involves predicting future time series data based on historical observations. Existing research primarily emphasizes the development of complex spatial-temporal models that capture spatial dependencies and temporal correlations among time series variables explicitly. However, recent advances have been impeded by challenges relating to data scarcity and model robustness. To address these issues, we propose Spatial-Temporal Masked Autoencoders (STMAE), an MTS forecasting framework that leverages masked autoencoders to enhance the performance of spatial-temporal baseline models. STMAE consists of two learning stages. In the pretraining stage, an encoder-decoder architecture is employed. The encoder processes the partially visible MTS data produced by a novel dual-masking strategy, including biased random walk-based spatial masking and patch-based temporal masking. Subsequently, the decoders aim to reconstruct the masked counterparts from both spatial and temporal perspectives. The pretraining stage establishes a challenging pretext task, compelling the encoder to learn robust spatial-temporal patterns. In the fine-tuning stage, the pretrained encoder is retained, and the original decoder from existing spatial-temporal models is appended for forecasting. Extensive experiments are conducted on multiple MTS benchmarks. The promising results demonstrate that integrating STMAE into various spatial-temporal models can largely enhance their MTS forecasting capability.
PDT: Pretrained Dual Transformers for Time-aware Bipartite Graphs
Dai, Xin, Fan, Yujie, Zhuang, Zhongfang, Jain, Shubham, Yeh, Chin-Chia Michael, Wang, Junpeng, Wang, Liang, Zheng, Yan, Aboagye, Prince Osei, Zhang, Wei
Pre-training on large models is prevalent and emerging Fundamentally, a common goal of data mining applications with the ever-growing user-generated content in many using user-content interactions is to understand machine learning application categories. It has been user's behaviors [17] and content's properties. Researchers recognized that learning contextual knowledge from attempt multiple ways to model such behaviors: the datasets depicting user-content interaction plays The time-related nature of the interactions is a fit for a vital role in downstream tasks. Despite several sequential models, such as recurrent neural networks studies attempting to learn contextual knowledge via pretraining (RNN), and the interactions and relations can be modeled methods, finding an optimal training objective as graph neural networks (GNN). Conventionally, and strategy for this type of task remains a challenging the training objective is to minimize the loss of a specific problem. In this work, we contend that there are two task such that one model is tailored to a particular distinct aspects of contextual knowledge, namely the application (e.g., recommendation). This approach is user-side and the content-side, for datasets where usercontent simple and effective for every data mining application.
Adversarial Collaborative Filtering for Free
Chen, Huiyuan, Li, Xiaoting, Lai, Vivian, Yeh, Chin-Chia Michael, Fan, Yujie, Zheng, Yan, Das, Mahashweta, Yang, Hao
Collaborative Filtering (CF) has been successfully used to help users discover the items of interest. Nevertheless, existing CF methods suffer from noisy data issue, which negatively impacts the quality of recommendation. To tackle this problem, many prior studies leverage adversarial learning to regularize the representations of users/items, which improves both generalizability and robustness. Those methods often learn adversarial perturbations and model parameters under min-max optimization framework. However, there still have two major drawbacks: 1) Existing methods lack theoretical guarantees of why adding perturbations improve the model generalizability and robustness; 2) Solving min-max optimization is time-consuming. In addition to updating the model parameters, each iteration requires additional computations to update the perturbations, making them not scalable for industry-scale datasets. In this paper, we present Sharpness-aware Collaborative Filtering (SharpCF), a simple yet effective method that conducts adversarial training without extra computational cost over the base optimizer. To achieve this goal, we first revisit the existing adversarial collaborative filtering and discuss its connection with recent Sharpness-aware Minimization. This analysis shows that adversarial training actually seeks model parameters that lie in neighborhoods around the optimal model parameters having uniformly low loss values, resulting in better generalizability. To reduce the computational overhead, SharpCF introduces a novel trajectory loss to measure the alignment between current weights and past weights. Experimental results on real-world datasets demonstrate that our SharpCF achieves superior performance with almost zero additional computational cost comparing to adversarial training.
EmbeddingTree: Hierarchical Exploration of Entity Features in Embedding
Zheng, Yan, Wang, Junpeng, Yeh, Chin-Chia Michael, Fan, Yujie, Chen, Huiyuan, Wang, Liang, Zhang, Wei
Embedding learning transforms discrete data entities into continuous numerical representations, encoding features/properties of the entities. Despite the outstanding performance reported from different embedding learning algorithms, few efforts were devoted to structurally interpreting how features are encoded in the learned embedding space. This work proposes EmbeddingTree, a hierarchical embedding exploration algorithm that relates the semantics of entity features with the less-interpretable embedding vectors. An interactive visualization tool is also developed based on EmbeddingTree to explore high-dimensional embeddings. The tool helps users discover nuance features of data entities, perform feature denoising/injecting in embedding training, and generate embeddings for unseen entities. We demonstrate the efficacy of EmbeddingTree and our visualization tool through embeddings generated for industry-scale merchant data and the public 30Music listening/playlists dataset.
Sharpness-Aware Graph Collaborative Filtering
Chen, Huiyuan, Yeh, Chin-Chia Michael, Fan, Yujie, Zheng, Yan, Wang, Junpeng, Lai, Vivian, Das, Mahashweta, Yang, Hao
Graph Neural Networks (GNNs) have achieved impressive performance in collaborative filtering. However, GNNs tend to yield inferior performance when the distributions of training and test data are not aligned well. Also, training GNNs requires optimizing non-convex neural networks with an abundance of local and global minima, which may differ widely in their performance at test time. Thus, it is essential to choose the minima carefully. Here we propose an effective training schema, called {gSAM}, under the principle that the \textit{flatter} minima has a better generalization ability than the \textit{sharper} ones. To achieve this goal, gSAM regularizes the flatness of the weight loss landscape by forming a bi-level optimization: the outer problem conducts the standard model training while the inner problem helps the model jump out of the sharp minima. Experimental results show the superiority of our gSAM.
Boosting Graph Neural Networks via Adaptive Knowledge Distillation
Guo, Zhichun, Zhang, Chunhui, Fan, Yujie, Tian, Yijun, Zhang, Chuxu, Chawla, Nitesh
Graph neural networks (GNNs) have shown remarkable performance on diverse graph mining tasks. Although different GNNs can be unified as the same message passing framework, they learn complementary knowledge from the same graph. Knowledge distillation (KD) is developed to combine the diverse knowledge from multiple models. It transfers knowledge from high-capacity teachers to a lightweight student. However, to avoid oversmoothing, GNNs are often shallow, which deviates from the setting of KD. In this context, we revisit KD by separating its benefits from model compression and emphasizing its power of transferring knowledge. To this end, we need to tackle two challenges: how to transfer knowledge from compact teachers to a student with the same capacity; and, how to exploit student GNN's own strength to learn knowledge. In this paper, we propose a novel adaptive KD framework, called BGNN, which sequentially transfers knowledge from multiple GNNs into a student GNN. We also introduce an adaptive temperature module and a weight boosting module. These modules guide the student to the appropriate knowledge for effective learning. Extensive experiments have demonstrated the effectiveness of BGNN. In particular, we achieve up to 3.05% improvement for node classification and 6.35% improvement for graph classification over vanilla GNNs.
Dynamic Graph Node Classification via Time Augmentation
Sun, Jiarui, Gu, Mengting, Yeh, Chin-Chia Michael, Fan, Yujie, Chowdhary, Girish, Zhang, Wei
Node classification for graph-structured data aims to classify nodes whose labels are unknown. While studies on static graphs are prevalent, few studies have focused on dynamic graph node classification. Node classification on dynamic graphs is challenging for two reasons. First, the model needs to capture both structural and temporal information, particularly on dynamic graphs with a long history and require large receptive fields. Second, model scalability becomes a significant concern as the size of the dynamic graph increases. To address these problems, we propose the Time Augmented Dynamic Graph Neural Network (TADGNN) framework. TADGNN consists of two modules: 1) a time augmentation module that captures the temporal evolution of nodes across time structurally, creating a time-augmented spatio-temporal graph, and 2) an information propagation module that learns the dynamic representations for each node across time using the constructed time-augmented graph. We perform node classification experiments on four dynamic graph benchmarks. Experimental results demonstrate that TADGNN framework outperforms several static and dynamic state-of-the-art (SOTA) GNN models while demonstrating superior scalability. We also conduct theoretical and empirical analyses to validate the efficiency of the proposed method. Our code is available at https://sites.google.com/view/tadgnn.
Let Graph be the Go Board: Gradient-free Node Injection Attack for Graph Neural Networks via Reinforcement Learning
Ju, Mingxuan, Fan, Yujie, Zhang, Chuxu, Ye, Yanfang
Graph Neural Networks (GNNs) have drawn significant attentions over the years and been broadly applied to essential applications requiring solid robustness or vigorous security standards, such as product recommendation and user behavior modeling. Under these scenarios, exploiting GNN's vulnerabilities and further downgrading its performance become extremely incentive for adversaries. Previous attackers mainly focus on structural perturbations or node injections to the existing graphs, guided by gradients from the surrogate models. Although they deliver promising results, several limitations still exist. For the structural perturbation attack, to launch a proposed attack, adversaries need to manipulate the existing graph topology, which is impractical in most circumstances. Whereas for the node injection attack, though being more practical, current approaches require training surrogate models to simulate a white-box setting, which results in significant performance downgrade when the surrogate architecture diverges from the actual victim model. To bridge these gaps, in this paper, we study the problem of black-box node injection attack, without training a potentially misleading surrogate model. Specifically, we model the node injection attack as a Markov decision process and propose Gradient-free Graph Advantage Actor Critic, namely G2A2C, a reinforcement learning framework in the fashion of advantage actor critic. By directly querying the victim model, G2A2C learns to inject highly malicious nodes with extremely limited attacking budgets, while maintaining a similar node feature distribution. Through our comprehensive experiments over eight acknowledged benchmark datasets with different characteristics, we demonstrate the superior performance of our proposed G2A2C over the existing state-of-the-art attackers. Source code is publicly available at: https://github.com/jumxglhf/G2A2C}.
Utilizing Social Media to Combat Opioid Addiction Epidemic: Automatic Detection of Opioid Users from Twitter
Zhang, Yiming ( ) | Fan, Yujie ( ) | Ye, Yanfang (West Virginia University) | Li, Xin ( ) | Winstanley, Erin L. ( )
Opioid (e.g., heroin and morphine) addiction has become one of the largest and deadliest epidemics in the United States. To combat such deadly epidemic, in this paper, we propose a novel framework named AutoOPU to automatically detect the opioid users from Twitter, which will assist in sharpening our understanding toward the behavioral process of opioid addiction and treatment. In AutoOPU, to model the users and posted tweets as well as their rich relationships, we first introduce a heterogeneous information network (HIN) for representation. Then we use meta-structure based approach to characterize the semantic relatedness over users. Afterwards, we integrate content-based similarity and relatedness depicted by each meta-structure to formulate a similarity measure over users. Further, we aggregate different similarities using multi-kernel learning, each of which is automatically weighted by the learning algorithm to make predictions. To the best of our knowledge, this is the first work to use multi-kernel learning based on meta-structures over HIN for biomedical knowledge mining, especially in drug-addiction domain. Comprehensive experiments on real sample collections from Twitter are conducted to validate the effectiveness of our developed system AutoOPU in opioid user detection by comparisons with other alternative methods.