Instructional Material
Chapter 1: Bird's Eye View of Applied Machine Learning - Data Science Primer
Welcome to our 7-part mini-course on data science and applied machine learning! Over these 7 chapters, our goal is to provide you with an end-to-end blueprint for applied machine learning, while keeping this as actionable and succinct as possible. With that, let's get started with a bird's eye view of the machine learning workflow. One really cool (optional) challenge you can do in the next hour is training your first machine learning model! That's right, we've put together a complete step-by-step tutorial for training a model that can predict wine quality.
A Bayesian Perspective of Statistical Machine Learning for Big Data
Sambasivan, Rajiv, Das, Sourish, Sahu, Sujit K
Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data is essentially the meaning of the keyword `learning' in SML. Theoretical justifications for the effectiveness of the SML algorithms are underpinned by sound principles from different disciplines, such as Computer Science and Statistics. The theoretical underpinnings particularly justified by statistical inference methods are together termed as statistical learning theory. This paper provides a review of SML from a Bayesian decision theoretic point of view -- where we argue that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm. We discuss many important SML techniques such as supervised and unsupervised learning, deep learning, online learning and Gaussian processes especially in the context of very large data sets where these are often employed. We present a dictionary which maps the key concepts of SML from Computer Science and Statistics. We illustrate the SML techniques with three moderately large data sets where we also discuss many practical implementation issues. Thus the review is especially targeted at statisticians and computer scientists who are aspiring to understand and apply SML for moderately large to big data sets.
Iterative Classroom Teaching
Yeo, Teresa, Kamalaruban, Parameswaran, Singla, Adish, Merchant, Arpit, Asselborn, Thibault, Faucon, Louis, Dillenbourg, Pierre, Cevher, Volkan
We consider the machine teaching problem in a classroom-like setting wherein the teacher has to deliver the same examples to a diverse group of students. Their diversity stems from differences in their initial internal states as well as their learning rates. We prove that a teacher with full knowledge about the learning dynamics of the students can teach a target concept to the entire classroom using O(min{d,N} log(1/eps)) examples, where d is the ambient dimension of the problem, N is the number of learners, and eps is the accuracy parameter. We show the robustness of our teaching strategy when the teacher has limited knowledge of the learners' internal dynamics as provided by a noisy oracle. Further, we study the trade-off between the learners' workload and the teacher's cost in teaching the target concept. Our experiments validate our theoretical results and suggest that appropriately partitioning the classroom into homogenous groups provides a balance between these two objectives.
How to launch your data science career (with Python)
If you're interested in the exciting world of data science, but don't know where to start, Data School is here to help. Data science can be an overwhelming field. Many people will tell you that you can't become a data scientist until you master the following: statistics, linear algebra, calculus, programming, databases, distributed computing, machine learning, visualization, experimental design, clustering, deep learning, natural language processing, and more. So, what exactly is data science? This workflow doesn't necessarily require advanced mathematics, a mastery of deep learning, or many of the other skills listed above.
The Future of AI in PR: Featuring Paul Roetzer - Cision
As a PR pro, what should you know about AI's impact on comms? Whether your brand is leveraging AI already, your target consumers are already engaging with it every day, through tools like Siri, Alexa, Netflix, or Google Maps. Artificial intelligence has the potential to save your company massive amounts of time and money, making your entire go to market process more efficient. But what's the best way to incorporate AI into you comms strategy? Take 6 minutes to absorb our latest video, and get critical insight from Paul Roetzer, founder of the Marketing Artificial Intelligence Institute, as he answers all the questions you may have about AI's impact on PR and marketing.
LSTM Model Architecture for Rare Event Time Series Forecasting
Time series forecasting with LSTMs directly has shown little success. This is surprising as neural networks are known to be able to learn complex non-linear relationships and the LSTM is perhaps the most successful type of recurrent neural network that is capable of directly supporting multivariate sequence prediction problems. A recent study performed at Uber AI Labs demonstrates how both the automatic feature learning capabilities of LSTMs and their ability to handle input sequences can be harnessed in an end-to-end model that can be used for drive demand forecasting for rare events like public holidays. In this post, you will discover an approach to developing a scalable end-to-end LSTM model for time series forecasting. In this post, we will review the 2017 paper titled "Time-series Extreme Event Forecasting with Neural Networks at Uber" by Nikolay Laptev, et al. presented at the Time Series Workshop, ICML 2017.
Real time numbers recognition (MNIST) on an iPhone with CoreML from A to Z ยท Blog ยท Liip
Learn how to build and train a deep learning network to recognize numbers (MNIST),how to convert it in the CoreML format to then deploy it on your iPhoneX and make it recognize numbers in realtime! This is the third part of our deep learning on mobile phones series. In part one I have shown you the two main tricks on how to use convolutions and pooling to train deep learning networks. In part two I have shown you how to train existing deep learning networks like resnet50 to detect new objects. In part three I will now show you how to train a deep learning network, how to convert it in the CoreML format and then deploy it on your mobile phone!
Machine Learning in the Tidyverse
This course will teach you to leverage the tools in the "tidyverse" to generate, explore, and evaluate machine learning models. Using a combination of tidyr and purrr packages, you will build a foundation for how to work with complex model objects in a "tidy" way. You will also learn how to leverage the broom package to explore your resulting models. You will then be introduced to the tools in the test-train-validate workflow, which will empower you evaluate the performance of both classification and regression models as well as provide the necessary information to optimize model performance via hyperparameter tuning.
How Google is looking to ensure AI development is ethical and fair
Following the announcement earlier this week of Google Cloud's AI Hub and Kubeflow Pipelines tools, Rajen Sheth, director of product management for Cloud AI, has outlined how the technology giant is working to ensure that its AI work is ethical and fair. In a blog post earlier this week titled'steering the right course for AI', he outlined what is seen as the main industry challenges to be overcome in order to make AI not just a reality, but one that is for the net good of society. Engaging with each of these in turn, he first suggests that unfair, or confirmation bias must be tackled "on multiple fronts," starting with awareness. "To foster a wider understanding of the need for fairness in technologies like machine learning, we've created educational resources like ml-fairness.com Google is also encouraging thorough documentation "as a means to better understand what goes on inside a machine learning solution". Within Google this takes the form of'model cards': "a standardised format for describing the goals, assumptions, performance metrics, and even ethical considerations of a machine learning model." Embedded documentation tools from Google Cloud, like the Inclusive ML Guide, integrated throughout AutoML, and TensorFlow Model Analysis (TFMA) and the What-If Tool all help with this. "I'm proud of the steps we're taking, and I believe the knowledge and tools we're developing will go a long way towards making AI more fair," he said, before reiterating that this is an industry-wide problem to be tackled. "No single company can solve such a complex problem alone.
Modelling student online behaviour in a virtual learning environment
Hlosta, Martin, Herrmannova, Drahomira, Vachova, Lucie, Kuzilek, Jakub, Zdrahal, Zdenek, Wolff, Annika
In recent years, distance education has enjoyed a major boom. Much work at The Open University (OU) has focused on improving retention rates in these modules by providing timely support to students who are at risk of failing the module. In this paper we explore methods for analysing student activity in online virtual learning environment (VLE) -- General Unary Hypotheses Automaton (GUHA) and Markov chain-based analysis -- and we explain how this analysis can be relevant for module tutors and other student support staff. We show that both methods are a valid approach to modelling student activities. An advantage of the Markov chain-based approach is in its graphical output and in the possibility to model time dependencies of the student activities.