A Latent Variable Recurrent Neural Network for Discourse Relation Language Models
Ji, Yangfeng, Haffari, Gholamreza, Eisenstein, Jacob
This paper presents a novel latent variable recurrent neural network architecture for jointly modeling sequences of words and (possibly latent) discourse relations between adjacent sentences. A recurrent neural network generates individual words, thus reaping the benefits of discriminatively-trained vector representations. The discourse relations are represented with a latent variable, which can be predicted or marginalized, depending on the task. The resulting model can therefore employ a training objective that includes not only discourse relation classification, but also word prediction. As a result, it outperforms state-of- the-art alternatives for two tasks: implicit discourse relation classification in the Penn Discourse Treebank, and dialog act classification in the Switchboard corpus. Furthermore, by marginalizing over latent discourse relations at test time, we obtain a discourse informed language model, which improves over a strong LSTM baseline.
Apr-5-2016
- Country:
- Europe (0.93)
- North America > United States (0.68)
- Asia (0.68)
- Genre:
- Research Report (0.64)
- Technology: