A Unifying Framework of Bilinear LSTMs

Rajpal, Mohit, Low, Bryan Kian Hsiang

arXiv.org Machine Learning 

This paper presents a novel unifying framework of bilinear L STMs that can represent and utilize the nonlinear interaction of the input feat ures present in sequence datasets for achieving superior performance over a linear L STM and yet not incur more parameters to be learned. To realize this, our unifying framework allows the expressivity of the linear vs. bilinear terms to be balan ced by correspondingly trading off between the hidden state vector size vs. approxi mation quality of the weight matrix in the bilinear term so as to optimize the perfo rmance of our bilinear LSTM, while not incurring more parameters to be learned. W e e mpirically evaluate the performance of our bilinear LSTM in several languag e-based sequence learning tasks to demonstrate its general applicability. Recurrent neural networks (RNNs) are popularized by their impressive performance in a wide variety of supervised and unsupervised sequence learning t asks, which include language modeling (Merity et al., 2018), statistical machine translation (Bahdanau et al., 2015), and coreference resolution (Lee et al., 2017). Different variants of RNNs su ch as long short-term memory (LSTM) networks (Hochreiter & Schmidhuber, 1997) and gated recurr ent units (Cho et al., 2014) share a common architectural trait of being built by feedforward ne ural networks connected in a recurrent manner. Typically, a RNN is instantiated by linear neurons coupled w ith a nonlinear activation function, which constitute its basic building blocks; to be consisten t with the literature (Park & Zhu, 1994), we refer to such neurons as linear . This should naturally affect the processin g of adjacent words based on context in a nonlinear manner (see Table 2 in Section 4.2).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found