Inductive Learning
Adversarial Validation, Explained
Many data science competitions suffer from a test set being markedly different from a training set (a violation of the "identically distributed" assumption). It is then difficult to make a representative validation set. We propose a method for selecting training examples most similar to test examples and using them as a validation set. The core of this idea is training a probabilistic classifier to distinguish train/test examples. In part one, we inspect the ideal case: training and testing examples coming from the same distribution, so that the validation error should give good estimation of the test error and classifier should generalize well to unseen test examples.
Semi-supervised Learning with Sparse Autoencoders in Phone Classification
Dhaka, Akash Kumar, Salvi, Giampiero
We propose the application of a semi-supervised learning method to improve the performance of acoustic modelling for automatic speech recognition based on deep neural net- works. As opposed to unsupervised initialisation followed by supervised fine tuning, our method takes advantage of both unlabelled and labelled data simultaneously through mini- batch stochastic gradient descent. We tested the method with varying proportions of labelled vs unlabelled observations in frame-based phoneme classification on the TIMIT database. Our experiments show that the method outperforms standard supervised training for an equal amount of labelled data and provides competitive error rates compared to state-of-the-art graph-based semi-supervised learning techniques.
Machine Learning for Drug Adverse Event Discovery
We can use unsupervised machine learning to identify which drugs are associated with which adverse events. Specifically, machine learning can help us to create clusters based on gender, age, outcome of adverse event, route drug was administered, purpose the drug was used for, body mass index, etc. This can help for quickly discovering hidden associations between drugs and adverse events. Clustering is a non-supervised learning technique which has wide applications. Some examples where clustering is commonly applied are market segmentation, social network analytics, and astronomical data analysis.
Adversarial validation, part two - FastML
In this second article on adversarial validation we get to the meat of the matter: what we can do when train and test sets differ. Will we be able to make a better validation set? The problem with training examples being different from test examples is that validation won't be any good for comparing models. That's because validation examples originate in the training set. We can see this effect when using Numerai data, which comes from financial time series.
Comparing supervised learning algorithms
In the data science course that I instruct, we cover most of the data science pipeline but focus especially on machine learning. Besides teaching model evaluation procedures and metrics, we obviously teach the algorithms themselves, primarily for supervised learning. Near the end of this 11-week course, we spend a few hours reviewing the material that has been covered throughout the course, with the hope that students will start to construct mental connections between all of the different things they have learned. One of the skills that I want students to be able to take away from this course is the ability to intelligently choose between supervised learning algorithms when working a machine learning problem. Although there is some value in the "brute force" approach (try everything and see what works best), there is a lot more value in being able to understand the trade-offs you're making when choosing one algorithm over another.
Pinnability: Machine learning in the home feed
The home feed, a collection of Pins from the people, boards and interests followed, as well as recommendations including Picked for You, is the most heavily user-engaged part of the service, and contributes a large fraction of total repins. The more people Pin, the better Pinterest can get for each person, which puts us in a unique position to serve up inspiration as a discovery engine on an ongoing basis. The home feed is a key way to discover new content, which is valuable to the Pinner, but poses a challenging question. Given the ever increasing number of Pins from various sources, how can we surface the most personalized and relevant Pins? Pinnability is the collective name of the machine learning models we developed to help Pinners find the best content in their home feed.
Machine Learning in Robotics – 5 Modern Applications
As the term "machine learning" has heated up, interest in "robotics" (as expressed in Google Trends) has not altered much over the last three years. So how much of a place is there for machine learning in robotics? While only a portion of recent developments in robotics can be credited to developments and uses of machine learning, I've aimed to collect some of the more prominent applications together in this article, along with links and references. Before I delve into machine learning in robotics, go ahead and define "robot". Though at first this might seem simple, it's no easy task to come to an agreement on just what a robot is and what it is not, even amongst roboticists.
The Deception of Supervised Learning
Do models or offline datasets ever really tell us what to do? Most application of supervised learning is predicated on this deception. Imagine you're a doctor tasked with choosing a cancer therapy. You could think hard about the problem and come up with some rules. But these rules would be overly simplistic, not personalized to the patient or customer.
Write Once, Run Anywhere: The IoT Machine Learning Shift From Proprietary Technology To Data »
While early artificial intelligence (AI) programs were a one-trick pony, typically only able to excel at one task, today it's about becoming a jack of all trades. The goal is to write one program that can solve multi-variant problems without the need to be rewritten when conditions change--write once, run anywhere. Digital heavyweights--notably Amazon, Google, IBM, and Microsoft--are now open sourcing their machine learning (ML) libraries in pursuit of that goal as competitive pressures shift focus from proprietary technologies to proprietary data for differentiation. Machine learning is the study of algorithms that learn from examples and experience, rather than relying on hard-coded rules that do not always adapt well to real-world environments. ABI Research forecasts ML-based IoT analytics revenues will grow from 2 billion in 2016 to more than 19 billion in 2021, with more than 90% of 2021 revenue to be attributed to more advanced analytics phases.
Business Case Drive Enhancements to Video Analytics
The video analytics industry is typically split into two distinct camps: (1) systems designed around rules and user-specified rules or models and (2) autonomous systems designed around machine learning. Supervised learning systems require heavy training and feedback to achieve the desired output, where unsupervised learning systems train themselves from the input data and require minimal human input. The video analytic solutions we saw in the market a decade ago seem rudimentary compared to today's offerings; partly due to the technology catching up with early promises and partly due to the industry's understanding and level-setting of expectations from the initial splash of analytics hyped as a panacea and the future of security. However, some of the extreme claims such as its ability to replace trained human operators, eliminate the need for well-designed camera placement, completely eliminate false positives, and determine a person's intent ahead of an action have proven to be more hype than reality for many end users.