Plotting

 National Institute of Information and Communications Technology


Syntax-Directed Attention for Neural Machine Translation

AAAI Conferences

Attention mechanism, including global attention and local attention, plays a key role in neural machine translation (NMT). Global attention attends to all source words for word prediction. In comparison, local attention selectively looks at fixed-window source words. However, alignment weights for the current target word often decrease to the left and right by linear distance centering on the aligned source position and neglect syntax distance constraints. In this paper, we extend the local attention with syntax-distance constraint, which focuses on syntactically related source words with the predicted target word to learning a more effective context vector for predicting translation. Moreover, we further propose a double context NMT architecture, which consists of a global context vector and a syntax-directed context vector from the global attention, to provide more translation performance for NMT from source representation. The experiments on the large-scale Chinese-to-English and English-to-German translation tasks show that the proposed approach achieves a substantial and significant improvement over the baseline system.


Deterministic Attention for Sequence-to-Sequence Constituent Parsing

AAAI Conferences

The sequence-to-sequence model is proven to be extremely successful in constituent parsing. It relies on one key technique, the probabilistic attention mechanism, to automatically select the context for prediction. Despite its successes, the probabilistic attention model does not always select the most important context. For example, the headword and boundary words of a subtree have been shown to be critical when predicting the constituent label of the subtree, but this contextual information becomes increasingly difficult to learn as the length of the sequence increases. In this study, we proposed a deterministic attention mechanism that deterministically selects the important context and is not affected by the sequence length. We implemented two different instances of this framework. When combined with a novel bottom-up linearization method, our parser demonstrated better performance than that achieved by the sequence-to-sequence parser with probabilistic attention mechanism.


Translation Prediction with Source Dependency-Based Context Representation

AAAI Conferences

Learning context representations is very promising to improve translation results, particularly through neural networks. Previous efforts process the context words sequentially and neglect their internal syntactic structure. In this paper, we propose a novel neural network based on bi-convolutional architecture to represent the source dependency-based context for translation prediction. The proposed model is able to not only encode the long-distance dependencies but also capture the functional similarities for better translation prediction (i.e., ambiguous words translation and word forms translation). Examined by a large-scale Chinese-English translation task, the proposed approach achieves a significant improvement (of up to +1.9 BLEU points) over the baseline system, and meanwhile outperforms a number of context-enhanced comparison system.


Improving Event Causality Recognition with Multiple Background Knowledge Sources Using Multi-Column Convolutional Neural Networks

AAAI Conferences

We propose a method for recognizing such event causalities as "smoke cigarettes" → "die of lung cancer" using background knowledge taken from web texts as well as original sentences from which candidates for the causalities were extracted. We retrieve texts related to our event causality candidates from four billion web pages by three distinct methods, including a why-question answering system, and feed them to our multi-column convolutional neural networks. This allows us to identify the useful background knowledge scattered in web texts and effectively exploit the identified knowledge to recognize event causalities. We empirically show that the combination of our neural network architecture and background knowledge significantly improves average precision, while the previous state-of-the-art method gains just a small benefit from such background knowledge.


Agreement on Target-Bidirectional LSTMs for Sequence-to-Sequence Learning

AAAI Conferences

Recurrent neural networks, particularly the long short- term memory networks, are extremely appealing for sequence-to-sequence learning tasks. Despite their great success, they typically suffer from a fundamental short- coming: they are prone to generate unbalanced targets with good prefixes but bad suffixes, and thus perfor- mance suffers when dealing with long sequences. We propose a simple yet effective approach to overcome this shortcoming. Our approach relies on the agreement between a pair of target-directional LSTMs, which generates more balanced targets. In addition, we develop two efficient approximate search methods for agreement that are empirically shown to be almost optimal in terms of sequence-level losses. Extensive experiments were performed on two standard sequence-to-sequence trans- duction tasks: machine transliteration and grapheme-to- phoneme transformation. The results show that the proposed approach achieves consistent and substantial im- provements, compared to six state-of-the-art systems. In particular, our approach outperforms the best reported error rates by a margin (up to 9% relative gains) on the grapheme-to-phoneme task.


A Semi-Supervised Learning Approach to Why-Question Answering

AAAI Conferences

We propose a semi-supervised learning method for improving why-question answering (why-QA). The key of our method is to generate training data (question-answer pairs) from causal relations in texts such as "[Tsunamis are generated]( effect ) because [the ocean's water mass is displaced by an earthquake]( cause )." A naive method for the generation would be to make a question-answer pair by simply converting the effect part of the causal relations into a why-question, like "Why are tsunamis generated?" from the above example, and using the source text of the causal relations as an answer. However, in our preliminary experiments, this naive method actually failed to improve the why-QA performance. The main reason was that the machine-generated questions were often incomprehensible like "Why does (it) happen?", and that the system suffered from overfitting to the results of our automatic causality recognizer. Hence, we developed a novel method that effectively filters out incomprehensible questions and retrieves from texts answers that are likely to be paraphrases of a given causal relation. Through a series of experiments, we showed that our approach significantly improved the precision of the top answer by 8% over the current state-of-the-art system for Japanese why-QA.


Active Learning for Generating Motion and Utterances in Object Manipulation Dialogue Tasks

AAAI Conferences

In an object manipulation dialogue, a robot may misunderstand an ambiguous command from a user, such as 'Place the cup down (on the table)," potentially resulting in an accident. Although making confirmation questions before all motion execution will decrease the risk of this failure, the user will find it more convenient if confirmation questions are not made under trivial situations. This paper proposes a method for estimating ambiguity in commands by introducing an active learning framework with Bayesian logistic regression to human-robot spoken dialogue. We conducted physical experiments in which a user and a manipulator-based robot communicated using spoken language to manipulate objects.


Robots that Learn to Communicate: A Developmental Approach to Personally and Physically Situated Human-Robot Conversations

AAAI Conferences

This paper summarizes the online machine learning method LCore, which enables robots to learn to communicate with users from scratch through verbal and behavioral interaction in the physical world. LCore combines speech, visual, and tactile information obtained through the interaction, and enables robots to learn beliefs regarding speech units, words, the concepts of objects, motions, grammar, and pragmatic and communicative capabilities. The overall belief system is represented by a dynamic graphical model in an integrated way. Experimental results show that through a small, practical number of learning episodes with a user, the robot was eventually able to understand even fragmental and ambiguous utterances, respond to them with confirmation questions and/or actions, generate directive utterances, and answer questions, appropriately for the given situation. This paper discusses the importance of a developmental approach to realize personally and physically situated human-robot conversations.