Like the course I just released on Hidden Markov Models, Recurrent Neural Networks are all about learning sequences - but whereas Markov Models are limited by the Markov assumption, Recurrent Neural Networks are not - and as a result, they are more expressive, and more powerful than anything we've seen on tasks that we haven't made progress on in decades. So what's going to be in this course and how will it build on the previous neural network courses and Hidden Markov Models? In the first section of the course we are going to add the concept of time to our neural networks. I'll introduce you to the Simple Recurrent Unit, also known as the Elman unit. We are going to revisit the XOR problem, but we're going to extend it so that it becomes the parity problem - you'll see that regular feedforward neural networks will have trouble solving this problem but recurrent networks will work because the key is to treat the input as a sequence.
Lori heads Alliances for the Computer Science and Artificial Intelligence Lab (CSAIL)– the largest lab at MIT with over 1000 people and home to MIT research initiatives on Big Data, Wireless and Cyber Security. In her role at CSAIL, she is responsible for corporate and organizational engagement through the CSAIL Alliance Program, the Visiting Industry Researcher program, CSAIL startups and technology ecosystem, the professional education partnership with EdX and MIT Professional education, as well as talent acquisition/recruiting programs within CSAIL. Lori also serves as the Executive Director of CyberSecurity@CSAIL, MIT's new research initiative focused on identifying and developing technologies to address the most significant security issues confronting organizations over the next decade. Additionally, Lori is the Executive Director of the research initiative BigData@CSAIL which focuses on the development of new methods for dealing with challenges posed by the ever increasing volume, velocity and variety of data and applying those techniques to specific research areas such as finance, medicine, social media and security. Previously, Lori was the Assistant Vice President of Corporate Engagement at Worcester Polytechnic Institute.
This course is about deep learning fundamentals and convolutional neural networks. Convolutional neural networks are one of the most successful deep learning approaches: self-driving cars rely heavily on this algorithm. First you will learn about densly connected neural networks and its problems. The next chapter are about convolutional neural networks: theory as well as implementation in Java with the deeplearning4j library. The last chapters are about recurrent neural networks and the applications!
The Teacher Forcing algorithm trains recurrent networks by supplying observed sequence values as inputs during training and using the network's own one-step-ahead predictions to do multi-step sampling. We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps. We apply Professor Forcing to language modeling, vocal synthesis on raw waveforms, handwriting generation, and image generation. Empirically we find that Professor Forcing acts as a regularizer, improving test likelihood on character level Penn Treebank and sequential MNIST. We also find that the model qualitatively improves samples, especially when sampling for a large number of time steps.
Hybrid methods that utilize both content and rating information are commonly used in many recommender systems. However, most of them use either handcrafted features or the bag-of-words representation as a surrogate for the content information but they are neither effective nor natural enough. To address this problem, we develop a collaborative recurrent autoencoder (CRAE) which is a denoising recurrent autoencoder (DRAE) that models the generation of content sequences in the collaborative filtering (CF) setting. To do this, we first develop a hierarchical Bayesian model for the DRAE and then generalize it to the CF setting. The synergy between denoising and CF enables CRAE to make accurate recommendations while learning to fill in the blanks in sequences.