This lecture will cover recurrent neural networks, the key ingredient in the deep learning toolbox for handling sequential computation and modelling sequences. It will start by explaining how gradients can be computed (by considering the time-unfolded graph) and how different architectures can be designed to summarize a sequence, generate a sequence by ancestral sampling in a fully-observed directed model, or learn to map a vector to a sequence, a sequence to a sequence (of the same or different length) or a sequence to a vector. The issue of long-term dependencies, why it arises, and what has been proposed to alleviate it will be core subject of the discussion in this lecture. This includes changes in the architecture and initialization, as well as how to properly characterize the architecture in terms of recurrent or feedforward depth and its ability to create shortcuts or fast propagation of gradients in the unfolded graph. Open questions regarding the limitations of training by maximum likelihood (teacher forcing) and ideas towards towards making learning online (not requiring backprop through time) will also be discussed.
Jan-19-2017, 00:20:36 GMT