Long Expressive Memory for Sequence Modeling
Rusch, T. Konstantin, Mishra, Siddhartha, Erichson, N. Benjamin, Mahoney, Michael W.
Learning tasks with sequential data as inputs (and possibly outputs) arise in a wide variety of contexts, including computer vision, text and speech recognition, natural language processing, and time series analysis in the sciences and engineering. While recurrent gradient-based models have been successfully used in processing sequential data sets, it is well-known that training these models to process (very) long sequential inputs is extremely challenging on account of the so-called exploding and vanishing gradients problem [32]. This arises as calculating hidden state gradients entails the computation of an iterative product of gradients over a large number of steps. Consequently, this (long) product can easily grow or decay exponentially in the number of recurrent interactions. Mitigation of the exploding and vanishing gradients problem has received considerable attention in the literature. A classical approach, used in Long Short-Term Memory (LSTM) [18] and Gated Recurrent Units (GRUs) [11], relies on gating mechanisms and leverages the resulting additive structure to ensure that gradients do not vanish.
Oct-10-2021
- Country:
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Health & Medicine (0.93)
- Technology: