Goto

Collaborating Authors

 Jin, Di


A Comprehensive Survey on Community Detection with Deep Learning

arXiv.org Artificial Intelligence

A community reveals the features and connections of its members that are different from those in other communities in a network. Detecting communities is of great significance in network analysis. Despite the classical spectral clustering and statistical inference methods, we notice a significant development of deep learning techniques for community detection in recent years with their advantages in handling high dimensional network data. Hence, a comprehensive overview of community detection's latest progress through deep learning is timely to both academics and practitioners. This survey devises and proposes a new taxonomy covering different categories of the state-of-the-art methods, including deep learning-based models upon deep neural networks, deep nonnegative matrix factorization and deep sparse filtering. The main category, i.e., deep neural networks, is further divided into convolutional networks, graph attention networks, generative adversarial networks and autoencoders. The survey also summarizes the popular benchmark data sets, model evaluation metrics, and open-source implementations to address experimentation settings. We then discuss the practical applications of community detection in various domains and point to implementation scenarios. Finally, we outline future directions by suggesting challenging topics in this fast-growing deep learning field.


Adversarial Contrastive Pre-training for Protein Sequences

arXiv.org Artificial Intelligence

Recent developments in Natural Language Processing (NLP) demonstrate that large-scale, self-supervised pre-training can be extremely beneficial for downstream tasks. These ideas have been adapted to other domains, including the analysis of the amino acid sequences of proteins. However, to date most attempts on protein sequences rely on direct masked language model style pre-training. In this work, we design a new, adversarial pre-training method for proteins, extending and specializing similar advances in NLP. We show compelling results in comparison to traditional MLM pre-training, though further development is needed to ensure the gains are worth the significant computational cost.


A Survey of Community Detection Approaches: From Statistical Modeling to Deep Learning

arXiv.org Artificial Intelligence

Community detection, a fundamental task for network analysis, aims to partition a network into multiple sub-structures to help reveal their latent functions. Community detection has been extensively studied in and broadly applied to many real-world network problems. Classical approaches to community detection typically utilize probabilistic graphical models and adopt a variety of prior knowledge to infer community structures. As the problems that network methods try to solve and the network data to be analyzed become increasingly more sophisticated, new approaches have also been proposed and developed, particularly those that utilize deep learning and convert networked data into low dimensional representation. Despite all the recent advancement, there is still a lack of insightful understanding of the theoretical and methodological underpinning of community detection, which will be critically important for future development of the area of network analysis. In this paper, we develop and present a unified architecture of network community-finding methods to characterize the state-of-the-art of the field of community detection. Specifically, we provide a comprehensive review of the existing community detection methods and introduce a new taxonomy that divides the existing methods into two categories, namely probabilistic graphical model and deep learning. We then discuss in detail the main idea behind each method in the two categories. Furthermore, to promote future development of community detection, we release several benchmark datasets from several problem domains and highlight their applications to various network analysis tasks. We conclude with discussions of the challenges of the field and suggestions of possible directions for future research.


Deep Learning for Text Attribute Transfer: A Survey

arXiv.org Artificial Intelligence

Driven by the increasingly larger deep learning models, neural language generation (NLG) has enjoyed unprecedentedly improvement and is now able to generate a diversity of human-like texts on demand, granting itself the capability of serving as a human writing assistant. Text attribute transfer is one of the most important NLG tasks, which aims to control certain attributes that people may expect the texts to possess, such as sentiment, tense, emotion, political position, etc. It has a long history in Natural Language Processing but recently gains much more attention thanks to the promising performance brought by deep learning models. In this article, we present a systematic survey on these works for neural text attribute transfer. We collect all related academic works since the first appearance in 2017. We then select, summarize, discuss, and analyze around 65 representative works in a comprehensive way. Overall, we have covered the task formulation, existing datasets and metrics for model development and evaluation, and all methods developed over the last several years. We reveal that existing methods are indeed based on a combination of several loss functions with each of which serving a certain goal. Such a unique perspective we provide could shed light on the design of new methods. We conclude our survey with a discussion on open issues that need to be resolved for better future development.


Graph Neural Network for Large-Scale Network Localization

arXiv.org Machine Learning

Graph neural networks (GNNs) are popular to use for classifying structured data in the context of machine learning. But surprisingly, they are rarely applied to regression problems. In this work, we adopt GNN for a classic but challenging nonlinear regression problem, namely the network localization. Our main findings are in order. First, GNN is potentially the best solution to large-scale network localization in terms of accuracy, robustness and computational time. Second, thresholding of the communication range is essential to its superior performance. Simulation results corroborate that the proposed GNN based method outperforms all benchmarks by far. Such inspiring results are further justified theoretically in terms of data aggregation, non-line-of-sight (NLOS) noise removal and lowpass filtering effect, all affected by the threshold for neighbor selection. Code is available at https://github.com/Yanzongzi/GNN-For-localization.


BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search

arXiv.org Artificial Intelligence

Relevance has significant impact on user experience and business profit for e-commerce search platform. In this work, we propose a data-driven framework for search relevance prediction, by distilling knowledge from BERT and related multi-layer Transformer teacher models into simple feed-forward networks with large amount of unlabeled data. The distillation process produces a student model that recovers more than 97\% test accuracy of teacher models on new queries, at a serving cost that's several magnitude lower (latency 150x lower than BERT-Base and 15x lower than the most efficient BERT variant, TinyBERT). The applications of temperature rescaling and teacher model stacking further boost model accuracy, without increasing the student model complexity. We present experimental results on both in-house e-commerce search relevance data as well as a public data set on sentiment analysis from the GLUE benchmark. The latter takes advantage of another related public data set of much larger scale, while disregarding its potentially noisy labels. Embedding analysis and case study on the in-house data further highlight the strength of the resulting model. By making the data processing and model training source code public, we hope the techniques presented here can help reduce energy consumption of the state of the art Transformer models and also level the playing field for small organizations lacking access to cutting edge machine learning hardwares.


Learning Continuous-Time Dynamics by Stochastic Differential Networks

arXiv.org Machine Learning

Learning continuous-time stochastic dynamics is a fundamental and essential problem in modeling sporadic time series, whose observations are irregular and sparse in both time and dimension. For a given system whose latent states and observed data are high-dimensional, it is generally impossible to derive a precise continuous-time stochastic process to describe the system behaviors. To solve the above problem, we apply Variational Bayesian method and propose a flexible continuous-time stochastic recurrent neural network named Variational Stochastic Differential Networks (VSDN), which embeds the complicated dynamics of the sporadic time series by neural Stochastic Differential Equations (SDE). VSDNs capture the stochastic dependency among latent states and observations by deep neural networks. We also incorporate two differential Evidence Lower Bounds to efficiently train the models. Through comprehensive experiments, we show that VSDNs outperform state-of-the-art continuous-time deep learning models and achieve remarkable performance on prediction and interpolation tasks for sporadic time series.


What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams

arXiv.org Artificial Intelligence

Open domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, MedQA, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively. We implement both rule-based and popular neural methods by sequentially combining a document retriever and a machine comprehension model. Through experiments, we find that even the current best method can only achieve 36.7\%, 42.0\%, and 70.1\% of test accuracy on the English, traditional Chinese, and simplified Chinese questions, respectively. We expect MedQA to present great challenges to existing OpenQA systems and hope that it can serve as a platform to promote much stronger OpenQA models from the NLP community in the future.


From Static to Dynamic Node Embeddings

arXiv.org Artificial Intelligence

We introduce a general framework for leveraging graph stream data for temporal prediction-based applications. Our proposed framework includes novel methods for learning an appropriate graph time-series representation, modeling and weighting the temporal dependencies, and generalizing existing embedding methods for such data. While previous work on dynamic modeling and embedding has focused on representing a stream of timestamped edges using a time-series of graphs based on a specific time-scale (e.g., 1 month), we propose the notion of an $\epsilon$-graph time-series that uses a fixed number of edges for each graph, and show its superiority over the time-scale representation used in previous work. In addition, we propose a number of new temporal models based on the notion of temporal reachability graphs and weighted temporal summary graphs. These temporal models are then used to generalize existing base (static) embedding methods by enabling them to incorporate and appropriately model temporal dependencies in the data. From the 6 temporal network models investigated (for each of the 7 base embedding methods), we find that the top-3 temporal models are always those that leverage the new $\epsilon$-graph time-series representation. Furthermore, the dynamic embedding methods from the framework almost always achieve better predictive performance than existing state-of-the-art dynamic node embedding methods that are developed specifically for such temporal prediction tasks. Finally, the findings of this work are useful for designing better dynamic embedding methods.


TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP

arXiv.org Artificial Intelligence

While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. This paper introduces TextAttack, a Python framework for adversarial attacks, data augmentation, and adversarial training in NLP. TextAttack builds attacks from four components: a goal function, a set of constraints, a transformation, and a search method. TextAttack's modular design enables researchers to easily construct attacks from combinations of novel and existing components. TextAttack provides implementations of 16 adversarial attacks from the literature and supports a variety of models and datasets, including BERT and other transformers, and all GLUE tasks. TextAttack also includes data augmentation and adversarial training modules for using components of adversarial attacks to improve model accuracy and robustness. TextAttack is democratizing NLP: anyone can try data augmentation and adversarial training on any model or dataset, with just a few lines of code. Code and tutorials are available at https://github.com/QData/TextAttack.