In this course we are going to look at advanced NLP. These allowed us to do some pretty cool things, like detect spam emails, write poetry, spin articles, and group together similar words. In this course I'm going to show you how to do even more awesome things. In this course, I'm going to show you exactly how word2vec works, from theory to implementation, and you'll see that it's merely the application of skills you already know. We are also going to look at the GLoVe method, which also finds word vectors, but uses a technique called matrix factorization, which is a popular algorithm for recommender systems.
That's the vision of Andrew Ng, a founder of the Google Brain deep learning project, and former head of AI at Baidu–a position he left in March–who is today announcing a set of five interconnected online courses on the subject. Participants in the "Deep Learning Specialization," available only through Coursera, will be steeped in neural networks, backpropagation, convolutional networks, recurrent networks, computer vision, natural language processing, and more. They'll get hands-on experience using the technology in healthcare, visual object recognition, music generation, language understanding, and other applications. "Today, if you want to learn deep learning, there are lots of people searching online, reading [dozens of] research papers, reading blog posts, and watching YouTube videos," Ng tells Fast Company. "I admire that, but I want to give people that want to break into AI a clear path of how to get there."
We study reinforcement learning of chatbots with recurrent neural network architectures when the rewards are noisy and expensive to obtain. For instance, a chatbot used in automated customer service support can be scored by quality assurance agents, but this process can be expensive, time consuming and noisy. Previous reinforcement learning work for natural language processing uses on-policy updates and/or is designed for on-line learning settings. We demonstrate empirically that such strategies are not appropriate for this setting and develop an off-policy batch policy gradient method (BPG). We demonstrate the efficacy of our method via a series of synthetic experiments and an Amazon Mechanical Turk experiment on a restaurant recommendations dataset.