Tsinghua University
RSDNE: Exploring Relaxed Similarity and Dissimilarity from Completely-Imbalanced Labels for Network Embedding
Wang, Zheng (Tsinghua University) | Ye, Xiaojun (Tsinghua University) | Wang, Chaokun (Tsinghua University) | Wu, Yuexin (Tsinghua University) | Wang, Changping (Tsinghua University) | Liang, Kaiwen (Tsinghua University)
Network embedding, aiming to project a network into a low-dimensional space, is increasingly becoming a focus of network research. Semi-supervised network embedding takes advantage of labeled data, and has shown promising performance. However, existing semi-supervised methods would get unappealing results in the completely-imbalanced label setting where some classes have no labeled nodes at all. To alleviate this, we propose a novel semi-supervised network embedding method, termed Relaxed Similarity and Dissimilarity Network Embedding (RSDNE). Specifically, to benefit from the completely-imbalanced labels, RSDNE guarantees both intra-class similarity and inter-class dissimilarity in an approximate way. Experimental results on several real-world datasets demonstrate the superiority of the proposed method.
Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention
Zeng, Xiangkai (Beihang University) | Yang, Cheng (Tsinghua University) | Tu, Cunchao (Tsinghua University) | Liu, Zhiyuan (Tsinghua University) | Sun, Maosong (Tsinghua University)
Linguistic Inquiry and Word Count (LIWC) is a word counting software tool which has been used for quantitative text analysis in many fields. Due to its success and popularity, the core lexicon has been translated into Chinese and many other languages. However, the lexicon only contains several thousand of words, which is deficient compared with the number of common words in Chinese. Current approaches often require manually expanding the lexicon, but it often takes too much time and requires linguistic experts to extend the lexicon. To address this issue, we propose to expand the LIWC lexicon automatically. Specifically, we consider it as a hierarchical classification problem and utilize the Sequence-to-Sequence model to classify words in the lexicon. Moreover, we use the sememe information with the attention mechanism to capture the exact meanings of a word, so that we can expand a more precise and comprehensive lexicon. The experimental results show that our model has a better understanding of word meanings with the help of sememes and achieves significant and consistent improvements compared with the state-of-the-art methods. The source code of this paper can be obtained from https://github.com/thunlp/Auto_CLIWC.
Asynchronous Bidirectional Decoding for Neural Machine Translation
Zhang, Xiangwen (Xiamen University) | Su, Jinsong (Xiamen University) | Qin, Yue (Xiamen University) | Liu, Yang (Tsinghua University) | Ji, Rongrong (Xiamen University) | Wang, Hongji (Xiamen University)
The dominant neural machine translation (NMT) models apply unified attentional encoder-decoder neural networks for translation. Traditionally, the NMT decoders adopt recurrent neural networks (RNNs) to perform translation in a left-to-right manner, leaving the target-side contexts generated from right to left unexploited during translation. In this paper, we equip the conventional attentional encoder-decoder NMT framework with a backward decoder, in order to explore bidirectional decoding for NMT. Attending to the hidden state sequence produced by the encoder, our backward decoder first learns to generate the target-side hidden state sequence from right to left. Then, the forward decoder performs translation in the forward direction, while in each translation prediction timestep, it simultaneously applies two attention models to consider the source-side and reverse target-side hidden states, respectively. With this new architecture, our model is able to fully exploit source- and target-side contexts to improve translation quality altogether. Experimental results on NIST Chinese-English and WMT English-German translation tasks demonstrate that our model achieves substantial improvements over the conventional NMT by 3.14 and 1.38 BLEU points, respectively. The source code of this work can be obtained from https://github.com/DeepLearnXMU/ABDNMT.
Multimodal Keyless Attention Fusion for Video Classification
Long, Xiang (Tsinghua University) | Gan, Chuang (Tsinghua University) | Melo, Gerard de (Rutgers University) | Liu, Xiao (Baidu) | Li, Yandong (Baidu) | Li, Fu (Baidu) | Wen, Shilei (Baidu)
The problem of video classification is inherently sequential and multimodal, and deep neural models hence need to capture and aggregate the most pertinent signals for a given input video. We propose Keyless Attention as an elegant and efficient means to more effectively account for the sequential nature of the data. Moreover, comparing a variety of multimodal fusion methods, we find that Multimodal Keyless Attention Fusion is the most successful at discerning interactions between modalities. We experiment on four highly heterogeneous datasets, UCF101, ActivityNet, Kinetics, and YouTube-8M to validate our conclusion, and show that our approach achieves highly competitive results. Especially on large-scale data, our method has great advantages in efficiency and performance. Most remarkably, our best single model can achieve 77.0% in terms of the top-1 accuracy and 93.2% in terms of the top-5 accuracy on the Kinetics validation set, and achieve 82.2% in terms of GAP@20 on the official YouTube-8M test set.
Reinforcement Learning for Relation Classification From Noisy Data
Feng, Jun (Tsinghua University) | Huang, Minlie (Tsinghua Unvesity) | Zhao, Li (Microsoft Research Asia) | Yang, Yang (Zhejiang University) | Zhu, Xiaoyan ( Tsinghua University )
Existing relation classification methods that rely on distant supervision assume that a bag of sentences mentioning an entity pair are all describing a relation for the entity pair. Such methods, performing classification at the bag level, cannot identify the mapping between a relation and a sentence, and largely suffers from the noisy labeling problem. In this paper, we propose a novel model for relation classification at the sentence level from noisy data. The model has two modules: an instance selector and a relation classifier. The instance selector chooses high-quality sentences with reinforcement learning and feeds the selected sentences into the relation classifier, and the relation classifier makes sentence-level prediction and provides rewards to the instance selector. The two modules are trained jointly to optimize the instance selection and relation classification processes.Experiment results show that our model can deal with the noise of data effectively and obtains better performance for relation classification at the sentence level.
Auto-Balanced Filter Pruning for Efficient Convolutional Neural Networks
Ding, Xiaohan (Tsinghua University) | Ding, Guiguang (Tsinghua University) | Han, Jungong (Lancaster University) | Tang, Sheng (Institute of Computing Technology,ย Chinese Academy of Sciences)
In recent years considerable research efforts have been devoted to compression techniques of convolutional neural networks (CNNs). Many works so far have focused on CNN connection pruning methods which produce sparse parameter tensors in convolutional or fully-connected layers. It has been demonstrated in several studies that even simple methods can effectively eliminate connections of a CNN. However, since these methods make parameter tensors just sparser but no smaller, the compression may not transfer directly to acceleration without support from specially designed hardware. In this paper, we propose an iterative approach named Auto-balanced Filter Pruning, where we pre-train the network in an innovative auto-balanced way to transfer the representational capacity of its convolutional layers to a fraction of the filters, prune the redundant ones, then re-train it to restore the accuracy. In this way, a smaller version of the original network is learned and the floating-point operations (FLOPs) are reduced. By applying this method on several common CNNs, we show that a large portion of the filters can be discarded without obvious accuracy drop, leading to significant reduction of computational burdens. Concretely, we reduce the inference cost of LeNet-5 on MNIST, VGG-16 and ResNet-56 on CIFAR-10 by 95.1%, 79.7% and 60.9%, respectively.
Selective Verification Strategy for Learning From Crowds
Tian, Tian (Tsinghua University) | Zhou, Yichi (Tsinghua University) | Zhu, Jun (Tsinghua University)
To deal with the low qualities of web workers in crowdsourcing, many unsupervised label aggregation methods have been investigated but most of them provide inconsistent performance. In this paper, we explore the learning from crowds with selective verification problem. In addition to the noisy responses from the crowds, it also collects the ground truths for a well-chosen subset of tasks as the reference, then aggregates the redundant responses based on the patterns provided by both the supervised and unsupervised signal. To improve the labeling efficiency, we propose the EBM selecting strategy for choosing the verification subset, which is based on the loss error minimization. Specifically, we first establish the expected loss error given the semi-supervised learning estimate, then find the subset that minimizes this selecting criterion. We do extensive empirical comparisons on both synthetic and real-world datasets to show the benefits of this new learning setting as well as our proposal.
Coalition Manipulation of Gale-Shapley Algorithm
Shen, Weiran (Tsinghua University) | Tang, Pingzhong (Tsinghua University) | Deng, Yuan (Duke University)
It is well-known that the Gale-Shapley algorithm is not truthful for all agents. Previous studies in this category concentrate on manipulations using incomplete preference lists by a single woman and by the set of all women. Little is known about manipulations by a subset of women. In this paper, we consider manipulations by any subset of women with arbitrary preferences. We show that a strong Nash equilibrium of the induced manipulation game always exists among the manipulators and the equilibrium outcome is unique and Pareto-dominant. In addition, the set of matchings achievable by manipulations has a lattice structure. We also examine the super-strong Nash equilibrium in the end.
Neural Knowledge Acquisition via Mutual Attention Between Knowledge Graph and Text
Han, Xu (Tsinghua University) | Liu, Zhiyuan (Tsinghua University) | Sun, Maosong (Tsinghua University)
We propose a general joint representation learning framework for knowledge acquisition (KA) on two tasks, knowledge graph completion (KGC) and relation extraction (RE) from text. In this framework, we learn representations of knowledge graphs (KGs) and text within a unified parameter sharing semantic space. To achieve better fusion, we propose an effective mutual attention between KGs and text. The reciprocal attention mechanism enables us to highlight important features and perform better KGC and RE. Different from conventional joint models, no complicated linguistic analysis or strict alignments between KGs and text are required to train our models. Experiments on relation extraction and entity link prediction show that models trained under our joint framework are significantly improved in comparison with other baselines. Most existing methods for KGC and RE can be easily integrated into our framework due to its flexible architectures. The source code of this paper can be obtained from https://github.com/thunlp/JointNRE.
Augmenting End-to-End Dialogue Systems With Commonsense Knowledge
Young, Tom (Beijing Institute of Technology) | Cambria, Erik ( Nanyang Technological University ) | Chaturvedi, Iti (Nanyang Technological University) | Zhou, Hao (Tsinghua University) | Biswas, Subham (Nanyang Technological University) | Huang, Minlie (Tsinghua University)
Building dialogue systems that can converse naturally with humans is a challenging yet intriguing problem of artificial intelligence. In open-domain human-computer conversation, where the conversational agent is expected to respond to human utterances in an interesting and engaging way, commonsense knowledge has to be integrated into the model effectively. In this paper, we investigate the impact of providing commonsense knowledge about the concepts covered in the dialogue. Our model represents the first attempt to integrating a large commonsense knowledge base into end-to-end conversational models. In the retrieval-based scenario, we propose a model to jointly take into account message content and related commonsense for selecting an appropriate response. Our experiments suggest that the knowledge-augmented models are superior to their knowledge-free counterparts.