We study response generation for open domain conversation in chatbots. Existing methods assume that words in responses are generated from an identical vocabulary regardless of their inputs, which not only makes them vulnerable to generic patterns and irrelevant noise, but also causes a high cost in decoding. We propose a dynamic vocabulary sequence-to-sequence (DVS2S) model which allows each input to possess their own vocabulary in decoding. In training, vocabulary construction and response generation are jointly learned by maximizing a lower bound of the true objective with a Monte Carlo sampling method. In inference, the model dynamically allocates a small vocabulary for an input with the word prediction model, and conducts decoding only with the small vocabulary. Because of the dynamic vocabulary mechanism, DVS2S eludes many generic patterns and irrelevant words in generation, and enjoys efficient decoding at the same time. Experimental results on both automatic metrics and human annotations show that DVS2S can significantly outperform state-of-the-art methods in terms of response quality, but only requires 60% decoding time compared to the most efficient baseline.
Recent advances in neural network-based generative modeling have reignited the hopes in having computer systems capable of seamlessly conversing with humans and able to understand natural language. Neural architectures have been employed to generate text excerpts to various degrees of success, in a multitude of contexts and tasks that fulfil various user needs. Notably, high capacity deep learning models trained on large scale datasets demonstrate unparalleled abilities to learn patterns in the data even in the lack of explicit supervision signals, opening up a plethora of new possibilities regarding producing realistic and coherent texts. While the field of natural language generation is evolving rapidly, there are still many open challenges to address. In this survey we formally define and categorize the problem of natural language generation. We review particular application tasks that are instantiations of these general formulations, in which generating natural language is of practical importance. Next we include a comprehensive outline of methods and neural architectures employed for generating diverse texts. Nevertheless, there is no standard way to assess the quality of text produced by these generative models, which constitutes a serious bottleneck towards the progress of the field. To this end, we also review current approaches to evaluating natural language generation systems. We hope this survey will provide an informative overview of formulations, methods, and assessments of neural natural language generation.
Xing, Chen (Nankai University) | Wu, Wei (Microsoft Research Asia) | Wu, Yu (Beihang University) | Liu, Jie (Nankai University) | Huang, Yalou (Nankai University) | Zhou, Ming (Microsoft Research Asia) | Ma, Wei-Ying (Microsoft Research Asia)
We consider incorporating topic information into a sequence-to-sequence framework to generate informative and interesting responses for chatbots. To this end, we propose a topic aware sequence-to-sequence (TA-Seq2Seq) model. The model utilizes topics to simulate prior human knowledge that guides them to form informative and interesting responses in conversation, and leverages topic information in generation by a joint attention mechanism and a biased generation probability. The joint attention mechanism summarizes the hidden vectors of an input message as context vectors by message attention and synthesizes topic vectors by topic attention from the topic words of the message obtained from a pre-trained LDA model, with these vectors jointly affecting the generation of words in decoding. To increase the possibility of topic words appearing in responses, the model modifies the generation probability of topic words by adding an extra probability item to bias the overall distribution. Empirical studies on both automatic evaluation metrics and human annotations show that TA-Seq2Seq can generate more informative and interesting responses, significantly outperforming state-of-the-art response generation models.
Many neural network models nowadays have achieved promising performances in Chit-chat settings. The majority of them rely on an encoder for understanding the post and a decoder for generating the response. Without given assigned semantics, the models lack the fine-grained control over responses as the semantic mapping between posts and responses is hidden on the fly within the end-to-end manners. Some previous works utilize sampled latent words as a controllable semantic form to drive the generated response around the work, but few works attempt to use more complex semantic forms to guide the generation. In this paper, we propose to use more detailed semantic forms, including latent responses and part-of-speech sequences sampled from the corresponding distributions, as the controllable semantics to guide the generation. Our experimental results show that the richer semantics are not only able to provide informative and diverse responses, but also increase the overall performance of response quality, including fluency and coherence.
There has been considerable progress made towards conversational models that generate coherent and fluent responses; however, this often involves training large language models on large dialogue datasets, such as Reddit. These large conversational models provide little control over the generated responses, and this control is further limited in the absence of annotated conversational datasets for attribute specific generation that can be used for fine-tuning the model. In this paper, we first propose and evaluate plug-and-play methods for controllable response generation, which does not require dialogue specific datasets and does not rely on fine-tuning a large model. While effective, the decoding procedure induces considerable computational overhead, rendering the conversational model unsuitable for interactive usage. To overcome this, we introduce an approach that does not require further computation at decoding time, while also does not require any fine-tuning of a large language model. We demonstrate, through extensive automatic and human evaluation, a high degree of control over the generated conversational responses with regard to multiple desired attributes, while being fluent.