Inductive Learning
Crowdwork for Machine Learning: An Autoethnography
Amazon's Mechanical Turk is a platform for soliciting work on online tasks that has been used by market researchers, translators, and data scientists to complete surveys, perform work that cannot be easily automated, and create human-labeled data for supervised learning systems. Its namesake, the original Mechanical Turk, was an 18th-century chess-playing automaton gifted to the Austrian Empress Maria Theresa. An elaborate hoax, it concealed a human player amidst the clockwork machinery that appeared to direct each move on the board. Amazon's Mechanical Turk (mTurk), which they call "artificial artificial intelligence," isn't all that different. From the outside, mTurk appears to perform tasks automatically that only humans can, like identifying objects in photographs, discerning the sentiment towards a brand in a tweet, or generating natural language in response to a prompt.
Decontamination of Mutual Contamination Models
Katz-Samuels, Julian, Blanchard, Gilles, Scott, Clayton
Many machine learning problems can be characterized by mutual contamination models. In these problems, one observes several random samples from different convex combinations of a set of unknown base distributions and the goal is to infer these base distributions. This paper considers the general setting where the base distributions are defined on arbitrary probability spaces. We examine three popular machine learning problems that arise in this general setting: multiclass classification with label noise, demixing of mixed membership models, and classification with partial labels. In each case, we give sufficient conditions for identifiability and present algorithms for the infinite and finite sample settings, with associated performance guarantees.
Machine learning concepts: styles of machine learning
This is the first in a series of posts about machine learning concepts, where we'll cover everything from learning styles to new dimensions in machine learning research. What makes machine learning so successful? The answer lies in the core concept of machine learning: a machine can learn from examples and experience. Before machine learning, machines were programmed with specific instructions and had no need to learn on their own. A machine (without machine learning) is born knowing exactly what it's supposed to do and how to do it, like a robot arm on an assembly line. The problem with this approach, as Erik Brynjolfsson and Andrew McAfee put so well, is that "we humans know more than we can tell."
Harri Valpola dreams of an internet of beautiful AI minds
"It will look like one huge brain from our perspective," he says, "in much the same way that the internet looks like one big thing." That impression will be an illusion, but we will think it all the same. It is only way our limited human brains will be able to comprehend an internet of connected artificial intelligences. Valpola has set himself the task of building this future network. But at the moment his goal seems far away. Despite all the advances made in recent years, the Finnish computer scientist is disappointed with the rate of progress in artificial intelligence.
Supervised Learning with Indefinite Topological Kernels
Padellini, Tullia, Brutti, Pierpaolo
Topological Data Analysis (TDA) is a recent and growing branch of statistics devoted to the study of the shape of the data. In this work we investigate the predictive power of TDA in the context of supervised learning. Since topological summaries, most noticeably the Persistence Diagram, are typically defined in complex spaces, we adopt a kernel approach to translate them into more familiar vector spaces. We define a topological exponential kernel, we characterize it, and we show that, despite not being positive semi-definite, it can be successfully used in regression and classification tasks.
LEADING OFF: HR Record Set to Fall, Maddon Back in TB
Chicago Cubs manager Joe Maddon says he's looking forward to a two-game series at Tampa Bay. Maddon guided the Rays for nine seasons -- they were the Devil Rays when he started in 2006 -- and went 754-708, along with leading them to their only World Series appearance. The Cubs are visiting Tropicana Field for the first time since 2008, when Tampa Bay swept all three games and then-rookie Evan Longoria had an RBI in each victory. Maddon still has a lot of friends in the area, and his foundation recently donated $25,000 to those affected by Hurricane Irma.
Interpretable Graph-Based Semi-Supervised Learning via Flows
Rustamov, Raif M., Klosowski, James T.
In this paper, we consider the interpretability of the foundational Laplacian-based semi-supervised learning approaches on graphs. We introduce a novel flow-based learning framework that subsumes the foundational approaches and additionally provides a detailed, transparent, and easily understood expression of the learning process in terms of graph flows. As a result, one can visualize and interactively explore the precise subgraph along which the information from labeled nodes flows to an unlabeled node of interest. Surprisingly, the proposed framework avoids trading accuracy for interpretability, but in fact leads to improved prediction accuracy, which is supported both by theoretical considerations and empirical results. The flow-based framework guarantees the maximum principle by construction and can handle directed graphs in an out-of-the-box manner.
Learning with Bounded Instance- and Label-dependent Label Noise
Cheng, Jiacheng, Liu, Tongliang, Ramamohanarao, Kotagiri, Tao, Dacheng
Instance- and label-dependent label noise (ILN) is widely existed in real-world datasets but has been rarely studied. In this paper, we focus on a particular case of ILN where the label noise rates, representing the probabilities that the true labels of examples flip into the corrupted labels, have upper bounds. We propose to handle this bounded instance- and label-dependent label noise under two different conditions. First, theoretically, we prove that when the marginal distributions $P(X|Y=+1)$ and $P(X|Y=-1)$ have non-overlapping supports, we can recover every noisy example's true label and perform supervised learning directly on the cleansed examples. Second, for the overlapping situation, we propose a novel approach to learn a well-performing classifier which needs only a few noisy examples to be labeled manually. Experimental results demonstrate that our method works well on both synthetic and real-world datasets.
Pseudo-labeling a simple semi-supervised learning method - Data, what now?
The foundation of every machine learning project is data – the one thing you cannot do without. In this post, I will show how a simple semi-supervised learning method called pseudo-labeling that can increase the performance of your favorite machine learning models by utilizing unlabeled data. To train a machine learning model with supervised learning, the data has to be labeled. Does that mean that unlabeled data is useless for supervised tasks like classification and regression? Aside from using the extra data for analytic purposes, we can even use it to help train our model with semi-supervised learning – combining both unlabeled and labeled data for model training.