Instructional Material
A Tutorial on Deep Latent Variable Models of Natural Language
Kim, Yoon, Wiseman, Sam, Rush, Alexander M.
There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning. Latent variable modeling makes it easy to explicitly specify model constraints through conditional independence properties, while deep learning makes it possible to parameterize these conditional likelihoods with powerful function approximators. While these "deep latent variable" models provide a rich, flexible framework for modeling many real-world phenomena, difficulties exist: deep parameterizations of conditional likelihoods usually make posterior inference intractable, and latent variable objectives often complicate backpropagation by introducing points of non-differentiability. This tutorial explores these issues in depth through the lens of variational inference.
Effective Feature Learning with Unsupervised Learning for Improving the Predictive Models in Massive Open Online Courses
Ding, Mucong, Yang, Kai, Yeung, Dit-Yan, Pong, Ting-Chuen
The effectiveness of learning in massive open online courses (MOOCs) can be significantly enhanced by introducing personalized intervention schemes which rely on building predictive models of student learning behaviors such as some engagement or performance indicators. A major challenge that has to be addressed when building such models is to design handcrafted features that are effective for the prediction task at hand. In this paper, we make the first attempt to solve the feature learning problem by taking the unsupervised learning approach to learn a compact representation of the raw features with a large degree of redundancy. Specifically, in order to capture the underlying learning patterns in the content domain and the temporal nature of the clickstream data, we train a modified auto-encoder (AE) combined with the long short-term memory (LSTM) network to obtain a fixed-length embedding for each input sequence. When compared with the original features, the new features that correspond to the embedding obtained by the modified LSTM-AE are not only more parsimonious but also more discriminative for our prediction task. Using simple supervised learning models, the learned features can improve the prediction accuracy by up to 17% compared with the supervised neural networks and reduce overfitting to the dominant low-performing group of students, specifically in the task of predicting students' performance. Our approach is generic in the sense that it is not restricted to a specific supervised learning model nor a specific prediction task for MOOC learning analytics.
Transfer Learning using Representation Learning in Massive Open Online Courses
Ding, Mucong, Wang, Yanbang, Hemberg, Erik, O'Reilly, Una-May
In a Massive Open Online Course (MOOC), predictive models of student behavior can support multiple aspects of learning, including instructor feedback and timely intervention. Ongoing courses, when the student outcomes are yet unknown, must rely on models trained from the historical data of previously offered courses. It is possible to transfer models, but they often have poor prediction performance. One reason is features that inadequately represent predictive attributes common to both courses. We present an automated transductive transfer learning approach that addresses this issue. It relies on problem-agnostic, temporal organization of the MOOC clickstream data, where, for each student, for multiple courses, a set of specific MOOC event types is expressed for each time unit. It consists of two alternative transfer methods based on representation learning with auto-encoders: a passive approach using transductive principal component analysis and an active approach that uses a correlation alignment loss term. With these methods, we investigate the transferability of dropout prediction across similar and dissimilar MOOCs and compare with known methods. Results show improved model transferability and suggest that the methods are capable of automatically learning a feature representation that expresses common predictive characteristics of MOOCs.
Continual Match Based Training in Pommerman: Technical Report
Peng, Peng, Pang, Liang, Yuan, Yufeng, Gao, Chao
Continual learning is the ability of agents to improve their capacities throughout multiple tasks continually. While recent works in the literature of continual learning mostly focused on developing either particular loss functions or specialized structures of neural network explaining the episodic memory or neural plasticity, we study continual learning from the perspective of the training mechanism. Specifically, we propose a COnitnual Match BAsed Training (COMBAT) framework for training a population of advantage-actor-critic (A2C) agents in Pommerman, a partially observable multi-agent environment with no communication. Following the COMBAT framework, we trained an agent, namely, Navocado, that won the title of the top 1 learning agent in the NeurIPS 2018 Pommerman Competition. Two critical features of our agent are worth mentioning. Firstly, our agent did not learn from any demonstrations. Secondly, our agent is highly reproducible. As a technical report, we articulate the design of state space, action space, reward, and most importantly, the COMBAT framework for our Pommerman agent. We show in the experiments that Pommerman is a perfect environment for studying continual learning, and the agent can improve its performance by continually learning new skills without forgetting the old ones. Finally, the result in the Pommerman Competition verifies the robustness of our agent when competing with various opponents.
Scalable multi-node training with TensorFlow Amazon Web Services
We've heard from customers that scaling TensorFlow training jobs to multiple nodes and GPUs successfully is hard. TensorFlow has distributed training built-in, but it can be difficult to use. Recently, we made optimizations to TensorFlow and Horovod to help AWS customers scale TensorFlow training jobs to multiple nodes and GPUs. With these improvements, any AWS customer can use an AWS Deep Learning AMI to train ResNet-50 on ImageNet in just under 15 minutes. To achieve this, 32 Amazon EC2 instances, each with 8 GPUs, a total 256 GPUs, were harnessed with TensorFlow. All of the required software and tools for this solution ship with the latest Deep Learning AMIs (DLAMIs), so you can try it out yourself. You can train faster, implement your models faster, and get results faster than ever before. This blog post describes our results and shows you how to try out this easier and faster way to run distributed training with TensorFlow. Figure A. ResNet-50 ImageNet model training with the latest optimized TensorFlow with Horovod on a Deep Learning AMI takes 15 minutes on 256 GPUs.
Introduction to Regularization to Reduce Overfitting of Deep Learning Neural Networks
The objective of a neural network is to have a final model that performs well both on the data that we used to train it (e.g. the training dataset) and the new data on which the model will be used to make predictions. The central challenge in machine learning is that we must perform well on new, previously unseen inputs -- not just those on which our model was trained. The ability to perform well on previously unobserved inputs is called generalization.
Decision Tree (CART) - Machine Learning Fun and Easy
Decision Tree (CART) - Machine Learning Fun and Easy https://www.udemy.com/machine-learnin... Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. A tree has many analogies in real life, and turns out that it has influenced a wide area of machine learning, covering both classification and regression (CART). So a decision tree is a flow-chart-like structure, where each internal node denotes a test on an attribute, each branch represents the outcome of a test, and each leaf (or terminal) node holds a class label. The topmost node in a tree is the root node. To learn more on Augmented Reality, IoT, Machine Learning FPGAs, Arduinos, PCB Design and Image Processing then Check out http://www.arduinostartups.com/
Understanding Artificial Intelligence โ Future Today โ Medium
When I published the article "Understanding Blockchain" many of you wrote me to ask me if I could make one dedicated to Artificial Intelligence. The truth is that I hadn't had time to get on with it and before sharing anything, I wanted to finish some courses in order to add value to the recommendations. The problem with Artificial Intelligence is that it's much more fragmented, both technologically and in use cases, than Blockchain, making it a real challenge to condense all the information and share it meaningfully. Likewise, I have tried to make an effort in the summary of key concepts and in the compilation of interesting sources and resources, I hope it helps you as well as it did to me! Let's start with a little history. The timeline you see is taken from this article and it shows the most important milestones of Artificial Intelligence.