Goto

Collaborating Authors

 Zhang, Chengqi


DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

AAAI Conferences

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, "Directional Self-Attention Network (DiSAN)," is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.


Network Representation Learning: A Survey

arXiv.org Machine Learning

With the widespread use of information technologies, information networks have increasingly become popular to capture complex relationships across various disciplines, such as social networks, citation networks, telecommunication networks, and biological networks. Analyzing these networks sheds light on different aspects of social life such as the structure of society, information diffusion, and different patterns of communication. However, the large scale of information networks often makes network analytic tasks computationally expensive and intractable. Recently, network representation learning has been proposed as a new learning paradigm that embeds network vertices into a low-dimensional vector space, by preserving network topology structure, vertex content, and other side information. This facilitates the original network to be easily handled in the new vector space for further analysis. In this survey, we perform a thorough review of the current literature on network representation learning in the field of data mining and machine learning. We propose a new categorization to analyze and summarize state-of-the-art network representation learning techniques according to the methodology they employ and the network information they preserve. Finally, to facilitate research on this topic, we summarize benchmark datasets and evaluation methodologies, and discuss open issues and future research directions in this field.


Dynamic Concept Composition for Zero-Example Event Detection

AAAI Conferences

In this paper, we focus on automatically detecting events in unconstrained videos without the use of any visual training exemplars. In principle, zero-shot learning makes it possible to train an event detection model based on the assumption that events (e.g. birthday party) can be described by multiple mid-level semantic concepts (e.g. ``blowing candle'', ``birthday cake''). Towards this goal, we first pre-train a bundle of concept classifiers using data from other sources. Then we evaluate the semantic correlation of each concept w.r.t. the event of interest and pick up the relevant concept classifiers, which are applied on all test videos to get multiple prediction score vectors. While most existing systems combine the predictions of the concept classifiers with fixed weights, we propose to learn the optimal weights of the concept classifiers for each testing video by exploring a set of online available videos with free-form text descriptions of their content. To validate the effectiveness of the proposed approach, we have conducted extensive experiments on the latest TRECVID MEDTest 2014, MEDTest 2013 and CCV dataset. The experimental results confirm the superiority of the proposed approach.


Active Learning from Oracle with Knowledge Blind Spot

AAAI Conferences

Active learning traditionally assumes that an oracle is capable of providing labeling information for each query instance. This paper formulates a new research problem which allows an oracle admit that he/she is incapable of labeling some query instances or simply answer "I don't know the label." We define a unified objectivefunction to ensure that each query instance submitted to the oracleis the one mostly needed for labeling and the oracle should also hasthe knowledge to label. Experiments based on different types of knowledge blind spot (KBS) models demonstrate the effectiveness of theproposed design.


An Empirical Study of Bagging Predictors for Different Learning Algorithms

AAAI Conferences

Bagging is a simple yet effective design which combines multiple single learners to form an ensemble for prediction. Despite its popular usage in many real-world applications, existing research is mainly concerned with studying unstable learners as the key to ensure the performance gain of a bagging predictor, with many key factors remaining unclear. For example, it is not clear when a bagging predictor can outperform a single learner and what is the expected performance gain when different learning algorithms were used to form a bagging predictor. In this paper, we carry out comprehensive empirical studies to evaluate bagging predictors by using 12 different learning algorithms and 48 benchmark data-sets. Our analysis uses robustness and stability decompositions to characterize different learning algorithms, through which we rank all learning algorithms and comparatively study their bagging predictors to draw conclusions. Our studies assert that both stability and robustness are key requirements to ensure the high performance for building a bagging predictor. In addition, our studies demonstrated that bagging is statistically superior to most single base learners, except for KNN and Naïve Bayes (NB). Multi-layer perception (MLP), Naïve Bayes Trees (NBTree), and PART are the learning algorithms with the best bagging performance.