Goto

Collaborating Authors

 Inductive Learning


In defense of skepticism about deep learning – Gary Marcus – Medium

#artificialintelligence

Despite the promising results obtained with [representations developed from Web image], the experiments demonstrate that object classification with real-life robotic data is far from being solved."


Ranking Data with Continuous Labels through Oriented Recursive Partitions

arXiv.org Machine Learning

We formulate a supervised learning problem, referred to as continuous ranking, where a continuous real-valued label Y is assigned to an observable r.v. X taking its values in a feature space $\mathcal{X}$ and the goal is to order all possible observations x in $\mathcal{X}$ by means of a scoring function $s:\mathcal{X}\rightarrow \mathbb{R}$ so that s(X) and Y tend to increase or decrease together with highest probability. This problem generalizes bi/multi-partite ranking to a certain extent and the task of finding optimal scoring functions s(x) can be naturally cast as optimization of a dedicated functional criterion, called the IROC curve here, or as maximization of the Kendall ${\tau}$ related to the pair (s(X), Y ). From the theoretical side, we describe the optimal elements of this problem and provide statistical guarantees for empirical Kendall ${\tau}$ maximization under appropriate conditions for the class of scoring function candidates. We also propose a recursive statistical learning algorithm tailored to empirical IROC curve optimization and producing a piecewise constant scoring function that is fully described by an oriented binary tree. Preliminary numerical experiments highlight the difference in nature between regression and continuous ranking and provide strong empirical evidence of the performance of empirical optimizers of the criteria proposed.


[D] Unsupervised-as-supervised learning • r/MachineLearning

#artificialintelligence

I'm including noise-contrastive estimation, and GANs, but I'm worried I won't have enough to write (need about 3000 words). I've gone through most of the citations for these papers, so I'm thinking of just including GAN variants (like f-GAN, WGAN etc) to fill out any additional space. Anyone know of any other papers using a similar sort of technique? I'm aware of the part in ESLII also, just need to skim it again before I start writing.


Fairness in Supervised Learning: An Information Theoretic Approach

arXiv.org Machine Learning

Automated decision making systems are increasingly being used in real-world applications. In these systems for the most part, the decision rules are derived by minimizing the training error on the available historical data. Therefore, if there is a bias related to a sensitive attribute such as gender, race, religion, etc. in the data, say, due to cultural/historical discriminatory practices against a certain demographic, the system could continue discrimination in decisions by including the said bias in its decision rule. We present an information theoretic framework for designing fair predictors from data, which aim to prevent discrimination against a specified sensitive attribute in a supervised learning setting. We use equalized odds as the criterion for discrimination, which demands that the prediction should be independent of the protected attribute conditioned on the actual label. To ensure fairness and generalization simultaneously, we compress the data to an auxiliary variable, which is used for the prediction task. This auxiliary variable is chosen such that it is decontaminated from the discriminatory attribute in the sense of equalized odds. The final predictor is obtained by applying a Bayesian decision rule to the auxiliary variable.


Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

arXiv.org Machine Learning

The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks. It maintains an exponential moving average of label predictions on each training example, and penalizes predictions that are inconsistent with this target. However, because the targets change only once per epoch, Temporal Ensembling becomes unwieldy when learning large datasets. To overcome this problem, we propose Mean Teacher, a method that averages model weights instead of label predictions. As an additional benefit, Mean Teacher improves test accuracy and enables training with fewer labels than Temporal Ensembling. Without changing the network architecture, Mean Teacher achieves an error rate of 4.35% on SVHN with 250 labels, outperforming Temporal Ensembling trained with 1000 labels. We also show that a good network architecture is crucial to performance. Combining Mean Teacher and Residual Networks, we improve the state of the art on CIFAR-10 with 4000 labels from 10.55% to 6.28%, and on ImageNet 2012 with 10% of the labels from 35.24% to 9.11%.


Top 10 Machine Learning Algorithms for Beginners

#artificialintelligence

The study of ML algorithms has gained immense traction post the Harvard Business Review article terming a'Data Scientist' as the'Sexiest job of the 21st century'. So, for those starting out in the field of ML, we decided to do a reboot of our immensely popular Gold blog The 10 Algorithms Machine Learning Engineers need to know - albeit this post is targetted towards beginners. ML algorithms are those that can learn from data and improve from experience, without human intervention. Learning tasks may include learning the function that maps the input to the output, learning the hidden structure in unlabeled data; or'instance-based learning', where a class label is produced for a new instance by comparing the new instance (row) to instances from the training data, which were stored in memory. 'Instance-based learning' does not create an abstraction from specific instances. Supervised learning can be explained as follows: use labeled training data to learn the mapping function from the input variables (X) to the output variable (Y).


What is Machine Learning?

#artificialintelligence

This post has only covered supervised learning, which refers to algorithms that learn from examples where we have both the input and the desired output. This is often referred to as labelled data, because the input values are labelled with the expected output. While this is a popular and powerful technique, there are others that work differently.


Learning Curves for Machine Learning

#artificialintelligence

When building machine learning models, we want to keep error as low as possible. Two major sources of error are bias and variance. If we managed to reduce these two, then we could build more accurate models. But how do we diagnose bias and variance in the first place? And what actions should we take once we've detected something? In this post, we'll learn how to answer both these questions using learning curves. We'll work with a real world data set and try to predict the electrical energy output of a power plant. Some familiarity with scikit-learn and machine learning theory is assumed. If you don't frown when I say cross-validation or supervised learning, then you're good to go. If you're new to machine learning and have never tried scikit, a good place to start is this blog post. We begin with a brief introduction to bias and variance.


A Nonlinear Kernel Support Matrix Machine for Matrix Learning

arXiv.org Machine Learning

In many problems of supervised tensor learning (STL), real world data such as face images or MRI scans are naturally represented as matrices, which are also called as second order tensors. Most existing classifiers based on tensor representation, such as support tensor machine (STM) need to solve iteratively which occupy much time and may suffer from local minima. In this paper, we present a kernel support matrix machine (KSMM) to perform supervised learning when data are represented as matrices. KSMM is a general framework for the construction of matrix-based hyperplane to exploit structural information. We analyze a unifying optimization problem for which we propose an asymptotically convergent algorithm. Theoretical analysis for the generalization bounds is derived based on Rademacher complexity with respect to a probability distribution. We demonstrate the merits of the proposed method by exhaustive experiments on both simulation study and a number of real-word datasets from a variety of application domains.


Probabilistic supervised learning

arXiv.org Machine Learning

Predictive modelling and supervised learning are central to modern data science. With predictions from an ever-expanding number of supervised black-box strategies - e.g., kernel methods, random forests, deep learning aka neural networks - being employed as a basis for decision making processes, it is crucial to understand the statistical uncertainty associated with these predictions. As a general means to approach the issue, we present an overarching framework for black-box prediction strategies that not only predict the target but also their own predictions' uncertainty. Moreover, the framework allows for fair assessment and comparison of disparate prediction strategies. For this, we formally consider strategies capable of predicting full distributions from feature variables, so-called probabilistic supervised learning strategies. Our work draws from prior work including Bayesian statistics, information theory, and modern supervised machine learning, and in a novel synthesis leads to (a) new theoretical insights such as a probabilistic bias-variance decomposition and an entropic formulation of prediction, as well as to (b) new algorithms and meta-algorithms, such as composite prediction strategies, probabilistic boosting and bagging, and a probabilistic predictive independence test. Our black-box formulation also leads (c) to a new modular interface view on probabilistic supervised learning and a modelling workflow API design, which we have implemented in the newly released skpro machine learning toolbox, extending the familiar modelling interface and meta-modelling functionality of sklearn. The skpro package provides interfaces for construction, composition, and tuning of probabilistic supervised learning strategies, together with orchestration features for validation and comparison of any such strategy - be it frequentist, Bayesian, or other.