Yang, Min
Sentiment Lexicon Enhanced Attention-Based LSTM for Sentiment Classification
Lei, Zeyang (Tsinghua University) | Yang, Yujiu (Tsinghua University) | Yang, Min ( Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences )
Deep neural networks have gained great success recently for sentiment classification. However, these approaches do not fully exploit the linguistic knowledge. In this paper, we propose a novel sentiment lexicon enhanced attention-based LSTM (SLEA-LSTM) model to improve the performance of sentence-level sentiment classification. Our method successfully integrates sentiment lexicon into deep neural networks via single-head or multi-head attention mechanisms. We conduct extensive experiments on MR and SST datasets. The experimental results show that our model achieved comparable or better performance than the state-of-the-art methods.
A Semi-Supervised Network Embedding Model for Protein Complexes Detection
Zhao, Wei (SIAT, Chinese Academy of Sciences) | Zhu, Jia (South China Normal University) | Yang, Min (SIAT, Chinese Academy of Sciences) | Xiao, Danyang (South China Normal University) | Fung, Gabriel Pui Cheong (The Chinese University of Hong Kong) | Chen, Xiaojun (Shenzhen University)
Protein complex is a group of associated polypeptide chains which plays essential roles in biological process. Given a graph representing protein-protein interactions (PPI) network, it is critical but non-trivial to detect protein complexes.In this paper, we propose a semi-supervised network embedding model by adopting graph convolutional networks to effectively detect densely connected subgraphs. We conduct extensive experiment on two popular PPI networks with various data sizes and densities. The experimental results show our approach achieves state-of-the-art performance.
Generative Adversarial Network for Abstractive Text Summarization
Liu, Linqing (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences) | Lu, Yao (Alberta Machine Intelligence Institute) | Yang, Min (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences) | Qu, Qiang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences) | Zhu, Jia (South China Normal University) | Li, Hongyan (Peking University)
In this paper, we propose an adversarial process for abstractive text summarization, in which we simultaneously train a generative model G and a discriminative model D. In particular, we build the generator G as an agent of reinforcement learning, which takes the raw text as input and predicts the abstractive summarization. We also build a discriminator which attempts to distinguish the generated summary from the ground truth summary. Extensive experiments demonstrate that our model achieves competitive ROUGE scores with the state-of-the-art methods on CNN/Daily Mail dataset. Qualitatively, we show that our model is able to generate more abstractive, readable and diverse summaries.
Attention Based LSTM for Target Dependent Sentiment Classification
Yang, Min (The University of Hong Kong) | Tu, Wenting (The University of Hong Kong) | Wang, Jingxuan (The University of Hong Kong) | Xu, Fei (Chinese Academy of Sciences) | Chen, Xiaojun (Shenzhen University)
We present an attention-based bidirectional LSTM approach to improve the target-dependent sentiment classification. Our method learns the alignment between the target entities and the most distinguishing features. We conduct extensive experiments on a real-life dataset. The experimental results show that our model achieves state-of-the-art results.
Authorship Attribution with Topic Drift Model
Yang, Min (The University of Hong Kong) | Zhu, Dingju (South China Normal University) | Tang, Yong (South China Normal University) | Wang, Jingxuan (The University of Hong Kong)
Authorship attribution is an active research direction due to its legal and financial importance. The goal is to identify the authorship of anonymous texts. In this paper, we propose a Topic Drift Model (TDM), monitoring the dynamicity of authors’ writing style and latent topics of interest. Our model is sensitive to the temporal information and the ordering of words, thus it extracts more information from texts.
Detecting Review Spammer Groups
Yang, Min (The University of Hong Kong) | Lu, Ziyu (The University of Hong Kong) | Chen, Xiaojun (Shenzhen University) | Xu, Fei ( Chinese Academy of Sciences )
With an increasing number of paid writers posting fake reviews to promote or demote some target entities through Internet, review spammer detection has become a crucial and challenging task. In this paper, we propose a three-phase method to address the problem of identifying review spammer groups and individual spammers, who get paid for posting fake comments. We evaluate the effectiveness and performance of the approach on a real-life online shopping review dataset from amazon.com. The experimental result shows that our model achieved comparable or better performance than previous work on spammer detection.
Ordering-Sensitive and Semantic-Aware Topic Modeling
Yang, Min (The University of Hong Kong) | Cui, Tianyi (Zhejiang University) | Tu, Wenting (The University of Hong Kong)
Topic modeling of textual corpora is an important and challenging problem. In most previous work, the “bag-of-words” assumption is usually made which ignores the ordering of words. This assumption simplifies the computation, but it unrealistically loses the ordering information and the semantic of words in the context. In this paper, we present a Gaussian Mixture Neural Topic Model (GMNTM) which incorporates both the ordering of words and the semantic meaning of sentences into topic modeling. Specifically, we represent each topic as a cluster of multi-dimensional vectors and embed the corpus into a collection of vectors generated by the Gaussian mixture model. Each word is affected not only by its topic, but also by the embedding vector of its surrounding words and the context. The Gaussian mixture components and the topic of documents, sentences and words can be learnt jointly. Extensive experiments show that our model can learn better topics and more accurate word distributions for each topic. Quantitatively, comparing to state-of-the-art topic modeling approaches, GMNTM obtains significantly better performance in terms of perplexity, retrieval accuracy and classification accuracy.
Relaxed Clipping: A Global Training Method for Robust Regression and Classification
Yang, Min, Xu, Linli, White, Martha, Schuurmans, Dale, Yu, Yao-liang
Robust regression and classification are often thought to require non-convex loss functions that prevent scalable, global training. However, such a view neglects the possibility of reformulated training methods that can yield practically solvable alternatives. A natural way to make a loss function more robust to outliers is to truncate loss values that exceed a maximum threshold. We demonstrate that a relaxation of this form of ``loss clipping'' can be made globally solvable and applicable to any standard loss while guaranteeing robustness against outliers. We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems.