Inductive Learning
Causal Confusion in Imitation Learning
de Haan, Pim, Jayaraman, Dinesh, Levine, Sergey
Behavioral cloning reduces policy learning to supervised learning by training a discriminative model to predict expert actions given observations. Such discriminative models are non-causal: the training procedure is unaware of the causal structure of the interaction between the expert and the environment. We point out that ignoring causality is particularly damaging because of the distributional shift in imitation learning. In particular, it leads to a counter-intuitive "causal confusion" phenomenon: access to more information can yield worse performance. We investigate how this problem arises, and propose a solution to combat it through targeted interventions---either environment interaction or expert queries---to determine the correct causal model. We show that causal confusion occurs in several benchmark control domains as well as realistic driving settings, and validate our solution against DAgger and other baselines and ablations.
Local Label Propagation for Large-Scale Semi-Supervised Learning
Zhuang, Chengxu, Ding, Xuehao, Murli, Divyanshu, Yamins, Daniel
A significant issue in training deep neural networks to solve supervised learning tasks is the need for large numbers of labelled datapoints. The goal of semi-supervised learning is to leverage ubiquitous unlabelled data, together with small quantities of labelled data, to achieve high task performance. Though substantial recent progress has been made in developing semi-supervised algorithms that are effective for comparatively small datasets, many of these techniques do not scale readily to the large (unlaballed) datasets characteristic of real-world applications. In this paper we introduce a novel approach to scalable semi-supervised learning, called Local Label Propagation (LLP). Extending ideas from recent work on unsupervised embedding learning, LLP first embeds datapoints, labelled and otherwise, in a common latent space using a deep neural network. It then propagates pseudolabels from known to unknown datapoints in a manner that depends on the local geometry of the embedding, taking into account both inter-point distance and local data density as a weighting on propagation likelihood. The parameters of the deep embedding are then trained to simultaneously maximize pseudolabel categorization performance as well as a metric of the clustering of datapoints within each psuedo-label group, iteratively alternating stages of network training and label propagation. We illustrate the utility of the LLP method on the ImageNet dataset, achieving results that outperform previous state-of-the-art scalable semi-supervised learning algorithms by large margins, consistently across a wide variety of training regimes. We also show that the feature representation learned with LLP transfers well to scene recognition in the Places 205 dataset.
Finding Task-Relevant Features for Few-Shot Learning by Category Traversal
Li, Hongyang, Eigen, David, Dodge, Samuel, Zeiler, Matthew, Wang, Xiaogang
Few-shot learning is an important area of research. Conceptually, humans are readily able to understand new concepts given just a few examples, while in more pragmatic terms, limited-example training situations are common in practice. Recent effective approaches to few-shot learning employ a metric-learning framework to learn a feature similarity comparison between a query (test) example, and the few support (training) examples. However, these approaches treat each support class independently from one another, never looking at the entire task as a whole. Because of this, they are constrained to use a single set of features for all possible test-time tasks, which hinders the ability to distinguish the most relevant dimensions for the task at hand. In this work, we introduce a Category Traversal Module that can be inserted as a plug-and-play module into most metric-learning based few-shot learners. This component traverses across the entire support set at once, identifying task-relevant features based on both intra-class commonality and inter-class uniqueness in the feature space. Incorporating our module improves performance considerably (5%-10% relative) over baseline systems on both mini-ImageNet and tieredImageNet benchmarks, with overall performance competitive with recent state-of-the-art systems.
A Flexible Generative Framework for Graph-based Semi-supervised Learning
Ma, Jiaqi, Tang, Weijing, Zhu, Ji, Mei, Qiaozhu
We consider a family of problems that are concerned about making predictions for the majority of unlabeled, graph-structured data samples based on a small proportion of labeled examples. Relational information among the data samples, often encoded in the graph or network structure, is shown to be helpful for these semi-supervised learning tasks. Conventional graph-based regularization methods and recent graph neural networks do not fully leverage the interrelations between the features, the graph, and the labels. We propose a flexible generative framework for graph-based semi-supervised learning, which approaches the joint distribution of the node features, labels, and the graph structure. Borrowing insights from random graph models in network science literature, this joint distribution can be instantiated using various distribution families. For the inference of missing labels, we exploit recent advances of scalable variational inference techniques to approximate the Bayesian posterior. We conduct thorough experiments on benchmark datasets for graph-based semi-supervised learning. Results show that the proposed methods outperform state-of-the-art models under most settings.
A Generalization Error Bound for Multi-class Domain Generalization
Deshmukh, Aniket Anand, Lei, Yunwen, Sharma, Srinagesh, Dogan, Urun, Cutler, James W., Scott, Clayton
Domain generalization is the problem of assigning labels to an unlabeled data set, given several similar data sets for which labels have been provided. Despite considerable interest in this problem over the last decade, there has been no theoretical analysis in the setting of multi-class classification. In this work, we study a kernel-based learning algorithm and establish a generalization error bound that scales logarithmically in the number of classes, matching state-of-the-art bounds for multi-class classification in the conventional learning setting. We also demonstrate empirically that the proposed algorithm achieves significant performance gains compared to a pooling strategy.
The Paradigm Shift of Self-Supervised Learning
"If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don't know how to make the cake." By 2016, Yann LeCun began to hedge with his use of the term "unsupervised learning". In NIPS 2016, he started to call it in even more nebulous terms "predictive learning": I have always had trouble with the use of the term "Unsupervised Learning". In 2017, I had predicted that Unsupervised Learning will not progress much and said "there seems to be a massive conceptual disconnect as to how exactly it should work" and that it was the "dark matter" of machine learning.
D$\textbf{S}^3$L: Deep Self-Semi-Supervised Learning for Image Recognition
Tsai, Tsung Wei, Li, Chongxuan, Zhu, Jun
Despite the recent progress in deep semi-supervised learning (Semi-SL), the amount of labels still plays a dominant role. The success in self-supervised learning (Self-SL) hints a promising direction to exploit the vast unlabeled data by leveraging an additional set of deterministic labels. In this paper, we propose Deep Self-Semi-Supervised learning (D$S^3$L), a flexible multi-task framework with shared parameters that integrates the rotation task in Self-SL with the consistency-based methods in deep Semi-SL. Our method is easy to implement and is complementary to all consistency-based approaches. The experiments demonstrate that our method significantly improves over the published state-of-the-art methods on several standard benchmarks, especially when fewer labels are presented.
Graph Representations for Higher-Order Logic and Theorem Proving
Paliwal, Aditya, Loos, Sarah, Rabe, Markus, Bansal, Kshitij, Szegedy, Christian
This paper presents the first use of graph neural networks (GNNs) for higher-order proof search and demonstrates that GNNs can improve upon state-of-the-art results in this domain. Interactive, higher-order theorem provers allow for the formalization of most mathematical theories and have been shown to pose a significant challenge for deep learning. Higher-order logic is highly expressive and, even though it is well-structured with a clearly defined grammar and semantics, there still remains no well-established method to convert formulas into graph-based representations. In this paper, we consider several graphical representations of higher-order logic and evaluate them against the HOList benchmark for higher-order theorem proving.
Semi-Supervised Learning with Scarce Annotations
Rebuffi, Sylvestre-Alvise, Ehrhardt, Sebastien, Han, Kai, Vedaldi, Andrea, Zisserman, Andrew
While semi-supervised learning (SSL) algorithms provide an efficient way to make use of both labelled and unlabelled data, they generally struggle when the number of annotated samples is very small. In this work, we consider the problem of SSL multi-class classification with very few labelled instances. We introduce two key ideas. The first is a simple but effective one: we leverage the power of transfer learning among different tasks and self-supervision to initialize a good representation of the data without making use of any label. The second idea is a new algorithm for SSL that can exploit well such a pre-trained representation. The algorithm works by alternating two phases, one fitting the labelled points and one fitting the unlabelled ones, with carefully-controlled information flow between them. The benefits are greatly reducing overfitting of the labelled data and avoiding issue with balancing labelled and unlabelled losses during training. We show empirically that this method can successfully train competitive models with as few as 10 labelled data points per class. More in general, we show that the idea of bootstrapping features using self-supervised learning always improves SSL on standard benchmarks. We show that our algorithm works increasingly well compared to other methods when refining from other tasks or datasets.