Inductive Learning
Roses are red, are you single, we wonder? 'Cos this moth-brain AI can read your phone number
A pair of academics have reproduced part of a moth's brain as an artificial neural network – and taught it to recognize numbers to a fairly high accuracy with just a few training examples. The software, dubbed MothNet, can apparently discern handwritten digits with 75 per cent to 85 per cent accuracy, given 15 to 20 training samples of each number. That's not bad considering it takes thousands of training examples for more traditional neural networks to achieve 99 per cent accuracy. Its masterminds, Charles B. Delahunt and J. Nathan Kutz, both at the University of Washington in the US, built MothNet by modeling the olfactory network – the part of the brain that processes smells – found in the Carolina sphinx moth, also known as the tobacco hawk moth (Manduca sexta). That section of a bug's grey matter is relatively straightforward, we're told, making it ideal for experimentation.
Metric-Optimized Example Weights
Zhao, Sen, Fard, Mahdi Milani, Gupta, Maya
Real-world machine learning applications often have complex test metrics, and may have training and test data that follow different distributions. We propose addressing these issues by using a weighted loss function with a standard convex loss, but with weights on the training examples that are learned to optimize the test metric of interest on the validation set. These metric-optimized example weights can be learned for any test metric, including black box losses and customized metrics for specific applications. We illustrate the performance of our proposal with public benchmark datasets and real-world applications with domain shift and custom loss functions that balance multiple objectives, impose fairness policies, and are non-convex and non-decomposable.
Privacy and machine learning: two unexpected allies?
In many applications of machine learning, such as machine learning for medical diagnosis, we would like to have machine learning algorithms that do not memorize sensitive information about the training set, such as the specific medical histories of individual patients. Differential privacy is a framework for measuring the privacy guarantees provided by an algorithm. Through the lens of differential privacy, we can design machine learning algorithms that responsibly train models on private data. Our works (with Martín Abadi, Úlfar Erlingsson, Ilya Mironov, Ananth Raghunathan, Shuang Song and Kunal Talwar) on differential privacy for machine learning have made it very easy for machine learning researchers to contribute to privacy research--even without being an expert on the mathematics of differential privacy. In this blog post, we'll show you how to do it. The key is a family of algorithms called Private Aggregation of Teacher Ensembles (PATE). One of the great things about the PATE framework, besides its name, is that anyone who knows how to train a supervised ML model (such as a neural net) can now contribute to research on differential privacy for machine learning.
An Overview of Proxy-label Approaches for Semi-supervised Learning
Note: Parts of this post are based on my ACL 2018 paper Strong Baselines for Neural Semi-supervised Learning under Domain Shift with Barbara Plank. Unsupervised learning constitutes one of the main challenges for current machine learning models and one of the key elements that is missing for general artificial intelligence. While unsupervised learning on its own is still elusive, researchers have a made a lot of progress in combining unsupervised learning with supervised learning. This branch of machine learning research is called semi-supervised learning. Semi-supervised learning has a long history. For a (slightly outdated) overview, refer to Zhu (2005) [1] and Chapelle et al. (2006) [2]. Particularly recently, semi-supervised learning has seen some success, considerably reducing the error rate on important benchmarks.
Non-Technical Person's Guide To Entering The Machine Learning Industry
As the buzz around data science grows every day, there is a slew of self-taught professionals who have kick-started the machine learning journey with Andrew Ng's online courses. Many enthusiasts are gravitating towards the computer science field. But if one wants to pursue a career in Machine Learning, they need to be familiar with statistics and linear algebra. With computer science and ML applications becoming more pervasive in everyday life, people from a non-technical background are also interested in joining the field. In this article, we have discussed in-depth roles a person from non-tech background can explore in the data science/AI field.
Multiview Learning of Weighted Majority Vote by Bregman Divergence Minimization
Goyal, Anil, Morvant, Emilie, Amini, Massih-Reza
We tackle the issue of classifier combinations when observations have multiple views. Our method jointly learns view-specific weighted majority vote classifiers (i.e. for each view) over a set of base voters, and a second weighted majority vote classifier over the set of these view-specific weighted majority vote classifiers. We show that the empirical risk minimization of the final majority vote given a multiview training set can be cast as the minimization of Bregman divergences. This allows us to derive a parallel-update optimization algorithm for learning our multiview model. We empirically study our algorithm with a particular focus on the impact of the training set size on the multiview learning results. The experiments show that our approach is able to overcome the lack of labeled information.
Adversarial examples from computational constraints
Bubeck, Sébastien, Price, Eric, Razenshteyn, Ilya
Why are classifiers in high dimension vulnerable to "adversarial" perturbations? We show that it is likely not due to information theoretic limitations, but rather it could be due to computational constraints. First we prove that, for a broad set of classification tasks, the mere existence of a robust classifier implies that it can be found by a possibly exponential-time algorithm with relatively few training examples. Then we give a particular classification task where learning a robust classifier is computationally intractable. More precisely we construct a binary classification task in high dimensional space which is (i) information theoretically easy to learn robustly for large perturbations, (ii) efficiently learnable (non-robustly) by a simple linear separator, (iii) yet is not efficiently robustly learnable, even for small perturbations, by any algorithm in the statistical query (SQ) model. This example gives an exponential separation between classical learning and robust learning in the statistical query model. It suggests that adversarial examples may be an unavoidable byproduct of computational limitations of learning algorithms.
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
Herzig, Roei, Raboh, Moshiko, Chechik, Gal, Berant, Jonathan, Globerson, Amir
Machine understanding of complex images is a key goal of artificial intelligence. One challenge underlying this task is that visual scenes contain multiple inter-related objects, and that global context plays an important role in interpreting the scene. A natural modeling framework for capturing such effects is structured prediction, which optimizes over complex labels, while modeling within-label interactions. However, it is unclear what principles should guide the design of a structured prediction model that utilizes the power of deep learning components. Here we propose a design principle for such architectures that follows from a natural requirement of permutation invariance. We prove a necessary and sufficient characterization for architectures that follow this invariance, and discuss its implication on model design. Finally, we show that the resulting model achieves new state of the art results on the Visual Genome scene graph labeling benchmark, outperforming all recent approaches.
Large Data and Zero Noise Limits of Graph-Based Semi-Supervised Learning Algorithms
Dunlop, Matthew M., Slepčev, Dejan, Stuart, Andrew M., Thorpe, Matthew
Scalings in which the graph Laplacian approaches a differential operator in the large graph limit are used to develop understanding of a number of algorithms for semi-supervised learning; in particular the extension, to this graph setting, of the probit algorithm, level set and kriging methods, are studied. Both optimization and Bayesian approaches are considered, based around a regularizing quadratic form found from an affine transformation of the Laplacian, raised to a, possibly fractional, exponent. Conditions on the parameters defining this quadratic form are identified under which well-defined limiting continuum analogues of the optimization and Bayesian semi-supervised learning problems may be found, thereby shedding light on the design of algorithms in the large graph setting. The large graph limits of the optimization formulations are tackled through $\Gamma$-convergence, using the recently introduced $TL^p$ metric. The small labelling noise limit of the Bayesian formulations are also identified, and contrasted with pre-existing harmonic function approaches to the problem.