A Unifying Framework of Bilinear LSTMs

Oct-22-2019–arXiv.org Machine Learning

This paper presents a novel unifying framework of bilinear L STMs that can represent and utilize the nonlinear interaction of the input feat ures present in sequence datasets for achieving superior performance over a linear L STM and yet not incur more parameters to be learned. To realize this, our unifying framework allows the expressivity of the linear vs. bilinear terms to be balan ced by correspondingly trading off between the hidden state vector size vs. approxi mation quality of the weight matrix in the bilinear term so as to optimize the perfo rmance of our bilinear LSTM, while not incurring more parameters to be learned. W e e mpirically evaluate the performance of our bilinear LSTM in several languag e-based sequence learning tasks to demonstrate its general applicability. Recurrent neural networks (RNNs) are popularized by their impressive performance in a wide variety of supervised and unsupervised sequence learning t asks, which include language modeling (Merity et al., 2018), statistical machine translation (Bahdanau et al., 2015), and coreference resolution (Lee et al., 2017). Different variants of RNNs su ch as long short-term memory (LSTM) networks (Hochreiter & Schmidhuber, 1997) and gated recurr ent units (Cho et al., 2014) share a common architectural trait of being built by feedforward ne ural networks connected in a recurrent manner. Typically, a RNN is instantiated by linear neurons coupled w ith a nonlinear activation function, which constitute its basic building blocks; to be consisten t with the literature (Park & Zhu, 1994), we refer to such neurons as linear . This should naturally affect the processin g of adjacent words based on context in a nonlinear manner (see Table 2 in Section 4.2).

deep learning, lstm, neural network, (20 more...)

arXiv.org Machine Learning

Oct-22-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)

Genre:
- Research Report (0.50)

Industry:
- Banking & Finance > Trading (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found