Inductive Learning
Link Prediction via Higher-Order Motif Features
Abuoda, Ghadeer, Morales, Gianmarco De Francisci, Aboulnaga, Ashraf
Link prediction requires predicting which new links are likely to appear in a graph. Being able to predict unseen links with good accuracy has important applications in several domains such as social media, security, transportation, and recommendation systems. A common approach is to use features based on the common neighbors of an unconnected pair of nodes to predict whether the pair will form a link in the future. In this paper, we present an approach for link prediction that relies on higher-order analysis of the graph topology, well beyond common neighbors. We treat the link prediction problem as a supervised classification problem, and we propose a set of features that depend on the patterns or motifs that a pair of nodes occurs in. By using motifs of sizes 3, 4, and 5, our approach captures a high level of detail about the graph topology within the neighborhood of the pair of nodes, which leads to a higher classification accuracy. In addition to proposing the use of motif-based features, we also propose two optimizations related to constructing the classification dataset from the graph. First, to ensure that positive and negative examples are treated equally when extracting features, we propose adding the negative examples to the graph as an alternative to the common approach of removing the positive ones. Second, we show that it is important to control for the shortest-path distance when sampling pairs of nodes to form negative examples, since the difficulty of prediction varies with the shortest-path distance. We experimentally demonstrate that using off-the-shelf classifiers with a well constructed classification dataset results in up to 10 percentage points increase in accuracy over prior topology-based and feature learning methods.
Partial Label Learning with Self-Guided Retraining
Partial label learning deals with the problem where each training instance is assigned a set of candidate labels, only one of which is correct. This paper provides the first attempt to leverage the idea of self-training for dealing with partially labeled examples. Specifically, we propose a unified formulation with proper constraints to train the desired model and perform pseudo-labeling jointly. For pseudo-labeling, unlike traditional self-training that manually differentiates the ground-truth label with enough high confidence, we introduce the maximum infinity norm regularization on the modeling outputs to automatically achieve this consideratum, which results in a convex-concave optimization problem. We show that optimizing this convex-concave problem is equivalent to solving a set of quadratic programming (QP) problems. By proposing an upper-bound surrogate objective function, we turn to solving only one QP problem for improving the optimization efficiency. Extensive experiments on synthesized and real-world datasets demonstrate that the proposed approach significantly outperforms the state-of-the-art partial label learning approaches.
A Smoother Way to Train Structured Prediction Models
Pillutla, Krishna, Roulet, Vincent, Kakade, Sham M., Harchaoui, Zaid
We present a framework to train a structured prediction model by performing smoothing on the inference algorithm it builds upon. Smoothing overcomes the non-smoothness inherent to the maximum margin structured prediction objective, and paves the way for the use of fast primal gradient-based optimization algorithms. We illustrate the proposed framework by developing a novel primal incremental optimization algorithm for the structural support vector machine. The proposed algorithm blends an extrapolation scheme for acceleration and an adaptive smoothing scheme and builds upon the stochastic variance-reduced gradient algorithm. We establish its worst-case global complexity bound and study several practical variants, including extensions to deep structured prediction. We present experimental results on two real-world problems, namely named entity recognition and visual object localization. The experimental results show that the proposed framework allows us to build upon efficient inference algorithms to develop large-scale optimization algorithms for structured prediction which can achieve competitive performance on the two real-world problems.
Semi-Supervised Learning by Label Gradient Alignment
Jackson, Jacob, Schulman, John
We present label gradient alignment, a novel algorithm for semi-supervised learning which imputes labels for the unlabeled data and trains on the imputed labels. We define a semantically meaningful distance metric on the input space by mapping a point (x, y) to the gradient of the model at (x, y). We then formulate an optimization problem whose objective is to minimize the distance between the labeled and the unlabeled data in this space, and we solve it by gradient descent on the imputed labels. We evaluate label gradient alignment using the standardized architecture introduced by Oliver et al. (2018) and demonstrate state-of-the-art accuracy in semi-supervised CIFAR-10 classification.
Robust One-Class Kernel Spectral Regression
Arashloo, Shervin Rahimzadeh, Kittler, Josef
The kernel null-space technique and its regression-based formulation (called one-class kernel spectral regression, a.k.a. OC-KSR) is known to be an effective and computationally attractive one-class classification framework. Despite its outstanding performance, the applicability of kernel null-space method is limited due to its susceptibility to possible training data corruptions and inability to rank training observations according to their conformity with the model. This work addresses these shortcomings by studying the effect of regularising the solution of the null-space kernel Fisher methodology in the context of its regression-based formulation (OC-KSR). In this respect, first, the effect of a Tikhonov regularisation in the Hilbert space is analysed where the one-class learning problem in presence of contaminations in the training set is posed as a sensitivity analysis problem. Next, driven by the success of the sparse representation methodology, the effect of a sparsity regularisation on the solution is studied. For both alternative regularisation schemes, iterative algorithms are proposed which recursively update label confidences and rank training observations based on their fit with the model. Through extensive experiments conducted on different data sets, the proposed methodology is found to enhance robustness against contamination in the training set as compared with the baseline kernel null-space technique as well as other existing approaches in a one-class classification paradigm while providing the functionality to rank training samples effectively.
Computational Limitations in Robust Classification and Win-Win Results
Degwekar, Akshay, Vaikuntanathan, Vinod
We continue the study of computational limitations in learning robust classifiers, following the recent work of Bubeck, Lee, Price and Razenshteyn. First, we demonstrate classification tasks where computationally efficient robust classifiers do not exist, even when computationally unbounded robust classifiers do. We rely on the hardness of decoding problems with preprocessing on codes and lattices. Second, we show classification tasks where efficient robust classifiers exist, but they are computationally hard to learn. Bubeck et al. showed examples of such tasks in the small-perturbation regime where the robust classifier can recover from a constant number of perturbed bits. Indeed, as we observe, the question of whether a large-perturbation robust classifier for their task exists is related to important open questions in computational number theory. We show two such classification tasks in the large-perturbation regime: the first relies on the existence of one-way functions, a minimal assumption in cryptography; and the second on the hardness of the learning parity with noise problem. For the second task, not only does a non-robust classifier exist, but also an efficient algorithm that generates fresh new labeled samples given access to polynomially many training examples (termed as generation by Kearns et. al. (1994)). Third, we show that any such task implies the existence of cryptographic primitives such as one-way functions or even forms of public-key encryption. This leads us to a win-win scenario: either we can quickly learn an efficient robust classifier (assuming one exists), or we can construct new instances of popular and useful cryptographic primitives.
Towards Federated Learning at Scale: System Design
Bonawitz, Keith, Eichner, Hubert, Grieskamp, Wolfgang, Huba, Dzmitry, Ingerman, Alex, Ivanov, Vladimir, Kiddon, Chloe, Konecny, Jakub, Mazzocchi, Stefano, McMahan, H. Brendan, Van Overveldt, Timon, Petrou, David, Ramage, Daniel, Roselander, Jason
Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions.
Learning from Dialogue after Deployment: Feed Yourself, Chatbot!
Hancock, Braden, Bordes, Antoine, Mazare, Pierre-Emmanuel, Weston, Jason
The majority of conversations a dialogue agent sees over its lifetime occur after it has already been trained and deployed, leaving a vast store of potential training signal untapped. In this work, we propose the self-feeding chatbot, a dialogue agent with the ability to extract new training examples from the conversations it participates in. As our agent engages in conversation, it also estimates user satisfaction in its responses. When the conversation appears to be going well, the user's responses become new training examples to imitate. When the agent believes it has made a mistake, it asks for feedback; learning to predict the feedback that will be given improves the chatbot's dialogue abilities further. On the PersonaChat chit-chat dataset with over 131k training examples, we find that learning from dialogue with a self-feeding chatbot significantly improves performance, regardless of the amount of traditional supervision.
High-dimensional semi-supervised learning: in search for optimal inference of the mean
We provide a high-dimensional semi-supervised inference framework focused on the mean and variance of the response. Our data are comprised of an extensive set of observations regarding the covariate vectors and a much smaller set of labeled observations where we observe both the response as well as the covariates. We allow the size of the covariates to be much larger than the sample size and impose weak conditions on a statistical form of the data. We provide new estimators of the mean and variance of the response that extend some of the recent results presented in low-dimensional models. In particular, at times we will not necessitate consistent estimation of the functional form of the data. Together with estimation of the population mean and variance, we provide their asymptotic distribution and confidence intervals where we showcase gains in efficiency compared to the sample mean and variance. Our procedure, with minor modifications, is then presented to make important contributions regarding inference about average treatment effects. We also investigate the robustness of estimation and coverage and showcase widespread applicability and generality of the proposed method.
Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Polylingual Text Classification
Esuli, Andrea, Moreo, Alejandro, Sebastiani, Fabrizio
Polylingual Text Classification (PLC) consists of automatically classifying, according to a common set C of classes, documents each written in one of a set of languages L, and doing so more accurately than when naively classifying each document via its corresponding language-specific classifier. In order to obtain an increase in the classification accuracy for a given language, the system thus needs to also leverage the training examples written in the other languages. We tackle multilabel PLC via funnelling, a new ensemble learning method that we propose here. Funnelling consists of generating a two-tier classification system where all documents, irrespectively of language, are classified by the same (2nd-tier) classifier. For this classifier all documents are represented in a common, language-independent feature space consisting of the posterior probabilities generated by 1st-tier, language-dependent classifiers. This allows the classification of all test documents, of any language, to benefit from the information present in all training documents, of any language. We present substantial experiments, run on publicly available polylingual text collections, in which funnelling is shown to significantly outperform a number of state-of-the-art baselines. All code and datasets (in vector form) are made publicly available.