Goto

Collaborating Authors

 Inductive Learning


Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs

arXiv.org Machine Learning

Cryo-electron microscopy (cryoEM) is fast becoming the preferred method for protein structure determination. Particle picking is a significant bottleneck in the solving of protein structures from single particle cryoEM. Hand labeling sufficient numbers of particles can take months of effort and current computationally based approaches are often ineffective. Here, we frame particle picking as a positive-unlabeled classification problem in which we seek to learn a convolutional neural network (CNN) to classify micrograph regions as particle or background from a small number of labeled positive examples and many unlabeled examples. However, model fitting with very few labeled data points is a challenging machine learning problem. To address this, we develop a novel objective function, GE-binomial, for learning model parameters in this context. This objective uses a newly-formulated generalized expectation criteria to learn effectively from unlabeled data when using minibatched stochastic gradient descent optimizers. On a high-quality publicly available cryoEM dataset and a difficult unpublished dataset supplied by the Shapiro lab, we show that CNNs trained with this objective classify particles accurately with very few positive training examples and outperform EMAN2's byRef method by a large margin even with fewer labeled training examples. Furthermore, we show that incorporating an autoencoder improves generalization when very few labeled data points are available. We also compare our GE-binomial method with other positive-unlabeled learning methods never before applied to particle picking. We expect our particle picking tool, Topaz, based on CNNs trained with GE-binomial, to be an essential component of single particle cryoEM analysis and our GE-binomial objective function to be widely applicable to positive-unlabeled classification problems.


#Definition: What is supervised learning in AI?

#artificialintelligence

The media update team explores the topic. Supervised learning can be defined as a type of machine learning algorithm that relies on a training dataset to make predictions. Breaking it down to basics, supervised machine learning is when a system receives a training dataset made up of input data and corresponding output data. From the training data, the system learns how the input led to the output data, creating a model – or what is called a'mapping function'. It can then be given different input data to predict what the output would be, based on the patterns it recognised in the training set it has learnt from.


Attention-based Graph Neural Network for Semi-supervised Learning

arXiv.org Machine Learning

Recently popularized graph neural networks achieve the state-of-the-art accuracy on a number of standard benchmark datasets for graph-based semi-supervised learning, improving significantly over existing approaches. These architectures alternate between a propagation layer that aggregates the hidden states of the local neighborhood and a fully-connected layer. Perhaps surprisingly, we show that a linear model, that removes all the intermediate fully-connected layers, is still able to achieve a performance comparable to the state-of-the-art models. This significantly reduces the number of parameters, which is critical for semi-supervised learning where number of labeled examples are small. This in turn allows a room for designing more innovative propagation layers. Based on this insight, we propose a novel graph neural network that removes all the intermediate fully-connected layers, and replaces the propagation layers with attention mechanisms that respect the structure of the graph. The attention mechanism allows us to learn a dynamic and adaptive local summary of the neighborhood to achieve more accurate predictions. In a number of experiments on benchmark citation networks datasets, we demonstrate that our approach outperforms competing methods. By examining the attention weights among neighbors, we show that our model provides some interesting insights on how neighbors influence each other.


Aggregation using input-output trade-off

arXiv.org Machine Learning

In this paper, we introduce a new learning strategy based on a seminal idea of Mojirsheibani (1999, 2000, 2002a, 2002b), who proposed a smart method for combining several classifiers, relying on a consensus notion. In many aggregation methods, the prediction for a new observation x is computed by building a linear or convex combination over a collection of basic estimators r1(x),. .. , rm(x) previously calibrated using a training data set. Mojirsheibani proposes to compute the prediction associated to a new observation by combining selected outputs of the training examples. The output of a training example is selected if some kind of consensus is observed: the predictions computed for the training example with the different machines have to be "similar" to the prediction for the new observation. This approach has been recently extended to the context of regression in Biau et al. (2016). In the original scheme, the agreement condition is actually required to hold for all individual estimators, which appears inadequate if there is one bad initial estimator. In practice, a few disagreements are allowed ; for establishing the theoretical results, the proportion of estimators satisfying the condition is required to tend to 1. In this paper, we propose an alternative procedure, mixing the previous consensus ideas on the predictions with the Euclidean distance computed between entries. This may be seen as an alternative approach allowing to reduce the effect of a possibly bad estimator in the initial list, using a constraint on the inputs. We prove the consistency of our strategy in classification and in regression. We also provide some numerical experiments on simulated and real data to illustrate the benefits of this new aggregation method. On the whole, our practical study shows that our method may perform much better than the original combination technique, and, in particular, exhibit far less variance. We also show on simulated examples that this procedure mixing inputs and outputs is still robust to high dimensional inputs.


Machine Learning (concepts) 101

#artificialintelligence

A few weeks ago I started seriously studying Machine Learning (ML). I have taken some courses and read some books before, but now I am taking a step ahead. I do plan to start working on ML projects and really get into the field. To me, the only way to make sure you understand something, is the fact that you are able to explain it to others. Because of this, I decided to write a Machine Learning 101 blog post, in which I explain some (very) basic ML concepts.


Adversarial Extreme Multi-label Classification

arXiv.org Machine Learning

The goal in extreme multi-label classification is to learn a classifier which can assign a small subset of relevant labels to an instance from an extremely large set of target labels. Datasets in extreme classification exhibit a long tail of labels which have small number of positive training instances. In this work, we pose the learning task in extreme classification with large number of tail-labels as learning in the presence of adversarial perturbations. This view motivates a robust optimization framework and equivalence to a corresponding regularized objective. Under the proposed robustness framework, we demonstrate efficacy of Hamming loss for tail-label detection in extreme classification. The equivalent regularized objective, in combination with proximal gradient based optimization, performs better than state-of-the-art methods on propensity scored versions of precision@k and nDCG@k(upto 20% relative improvement over PFastreXML - a leading tree-based approach and 60% relative improvement over SLEEC - a leading label-embedding approach). Furthermore, we also highlight the sub-optimality of a sparse solver in a widely used package for large-scale linear classification, which is interesting in its own right. We also investigate the spectral properties of label graphs for providing novel insights towards understanding the conditions governing the performance of Hamming loss based one-vs-rest scheme vis-\`a-vis label embedding methods.


Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking

arXiv.org Machine Learning

Methods that learn representations of nodes in a graph play a critical role in network analysis since they enable many downstream learning tasks. We propose Graph2Gauss - an approach that can efficiently learn versatile node embeddings on large scale (attributed) graphs that show strong performance on tasks such as link prediction and node classification. Unlike most approaches that represent nodes as point vectors in a low-dimensional continuous space, we embed each node as a Gaussian distribution, allowing us to capture uncertainty about the representation. Furthermore, we propose an unsupervised method that handles inductive learning scenarios and is applicable to different types of graphs: plain/attributed, directed/undirected. By leveraging both the network structure and the associated node attributes, we are able to generalize to unseen nodes without additional training. To learn the embeddings we adopt a personalized ranking formulation w.r.t. the node distances that exploits the natural ordering of the nodes imposed by the network structure. Experiments on real world networks demonstrate the high performance of our approach, outperforming state-of-the-art network embedding methods on several different tasks. Additionally, we demonstrate the benefits of modeling uncertainty - by analyzing it we can estimate neighborhood diversity and detect the intrinsic latent dimensionality of a graph.


N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification

arXiv.org Machine Learning

Graph Convolutional Networks (GCNs) have shown significant improvements in semi-supervised learning on graph-structured data. Concurrently, unsupervised learning of graph embeddings has benefited from the information contained in random walks. In this paper, we propose a model: Network of GCNs (N-GCN), which marries these two lines of work. At its core, N-GCN trains multiple instances of GCNs over node pairs discovered at different distances in random walks, and learns a combination of the instance outputs which optimizes the classification objective. Our experiments show that our proposed N-GCN model improves state-of-the-art baselines on all of the challenging node classification tasks we consider: Cora, Citeseer, Pubmed, and PPI. In addition, our proposed method has other desirable properties, including generalization to recently proposed semi-supervised learning methods such as GraphSAGE, allowing us to propose N-SAGE, and resilience to adversarial input perturbations.


My Journey into Machine Learning: Class 3 – Towards Data Science

@machinelearnbot

As we discussed in the first article, linear regression is a supervised learning algorithm where the output is continuous valued. Think of r t as the output and X t as the training examples. This is the ideal scenario that we would like to have: A function that predicts the output perfectly from the training examples. But this does not generally happen in the real world. There is an additional noise that needs to be added to the function to get the required output.


Vote-boosting ensembles

arXiv.org Machine Learning

Vote-boosting is a sequential ensemble learning method in which the individual classifiers are built on different weighted versions of the training data. To build a new classifier, the weight of each training instance is determined in terms of the degree of disagreement among the current ensemble predictions for that instance. For low class-label noise levels, especially when simple base learners are used, emphasis should be made on instances for which the disagreement rate is high. When more flexible classifiers are used and as the noise level increases, the emphasis on these uncertain instances should be reduced. In fact, at sufficiently high levels of class-label noise, the focus should be on instances on which the ensemble classifiers agree. The optimal type of emphasis can be automatically determined using cross-validation. An extensive empirical analysis using the beta distribution as emphasis function illustrates that vote-boosting is an effective method to generate ensembles that are both accurate and robust.