Goto

Collaborating Authors

 simple language model


A Simple Language Model for Task-Oriented Dialogue

Neural Information Processing Systems

Task-oriented dialogue is often decomposed into three tasks: understanding user input, deciding actions, and generating a response. While such decomposition might suggest a dedicated model for each sub-task, we find a simple, unified approach leads to state-of-the-art performance on the MultiWOZ dataset. SimpleTOD is a simple approach to task-oriented dialogue that uses a single, causal language model trained on all sub-tasks recast as a single sequence prediction problem. This allows SimpleTOD to fully leverage transfer learning from pre-trained, open domain, causal language models such as GPT-2. SimpleTOD improves over the prior state-of-the-art in joint goal accuracy for dialogue state tracking, and our analysis reveals robustness to noisy annotations in this setting. SimpleTOD also improves the main metrics used to evaluate action decisions and response generation in an end-to-end setting: inform rate by 8.1 points, success rate by 9.7 points, and combined score by 7.2 points.


Review for NeurIPS paper: A Simple Language Model for Task-Oriented Dialogue

Neural Information Processing Systems

Summary and Contributions: The authors propose SimpleTOD, which can replace modular task-oriented dialogue models to unified causal language model in an end-2-end manner. There are three sub-tasks in the task-oriented dialogue. They are dialogue state tracking, action prediction, and response generation. SimpleTOD treats all three sub-tasks as sequence generation. Whole up to dialogue context C_t is used as the first input to the model, and the model generates dialogue state B_t at turn t.


Review for NeurIPS paper: A Simple Language Model for Task-Oriented Dialogue

Neural Information Processing Systems

All reviewers find this work quite strong both in terms of approach and results, and reviewers applaud that the work has proved robustly reproducible. One important point is that several contemporaneous papers share some commonalities with this submission. We agree that they were published less than a month before the deadline and should therefore be considered contemporaneous; however it would have been much better scientific practice to include the discussion of these works in the submission, if the authors were aware of them -- regardless on when the authors put their initial submission on arxiv. The discussion that situates the submission in the context of these other works in the authors response is enlightening and interesting, and should definitely be in the final version. Conditioned on this being the case, we are happy to accept the paper.


A Simple Language Model for Task-Oriented Dialogue

Neural Information Processing Systems

Task-oriented dialogue is often decomposed into three tasks: understanding user input, deciding actions, and generating a response. While such decomposition might suggest a dedicated model for each sub-task, we find a simple, unified approach leads to state-of-the-art performance on the MultiWOZ dataset. SimpleTOD is a simple approach to task-oriented dialogue that uses a single, causal language model trained on all sub-tasks recast as a single sequence prediction problem. This allows SimpleTOD to fully leverage transfer learning from pre-trained, open domain, causal language models such as GPT-2. SimpleTOD improves over the prior state-of-the-art in joint goal accuracy for dialogue state tracking, and our analysis reveals robustness to noisy annotations in this setting.


Learning molecular dynamics with simple language model built upon long short-term memory neural network - Nature Communications

#artificialintelligence

Recurrent neural networks have led to breakthroughs in natural language processing and speech recognition. Here we show that recurrent networks, specifically long short-term memory networks can also capture the temporal evolution of chemical/biophysical trajectories. Our character-level language model learns a probabilistic model of 1-dimensional stochastic trajectories generated from higher-dimensional dynamics. The model captures Boltzmann statistics and also reproduces kinetics across a spectrum of timescales. We demonstrate how training the long short-term memory network is equivalent to learning a path entropy, and that its embedding layer, instead of representing contextual meaning of characters, here exhibits a nontrivial connectivity between different metastable states in the underlying physical system. We demonstrate our model’s reliability through different benchmark systems and a force spectroscopy trajectory for multi-state riboswitch. We anticipate that our work represents a stepping stone in the understanding and use of recurrent neural networks for understanding the dynamics of complex stochastic molecular systems. Artificial neural networks have been successfully used for language recognition. Tsai et al. use the same techniques to link between language processing and prediction of molecular trajectories and show capability to predict complex thermodynamics and kinetics arising in chemical or biological physics.