Inductive Learning
Computers Already Learn From Us. But Can They Teach Themselves?
Artificial intelligence seems to be everywhere, but what we are really witnessing is a supervised-learning revolution: We teach computers to see patterns, much as we teach children to read. But the future of A.I. depends on computer systems that learn on their own, without supervision, researchers say. When a mother points to a dog and tells her baby, "Look at the doggy," the child learns what to call the furry four-legged friends. But when that baby stands and stumbles, again and again, until she can walk, that is something else. Just as humans learn mostly through observation or trial and error, computers will have to go beyond supervised learning to reach the holy grail of human-level intelligence.
How Microsoft Set A New Benchmark To Track Fake News
Researchers from Microsoft, along with a team from Arizona State University, have published a work that has outperformed the current state-of-the-art models that detect fake news. Though the prevalence and promotion of misinformation have been since time immemorial, today, thanks to the convenience for access provided by the internet, fake news is rampant and has affected healthy conversations. Given the rapidly evolving nature of news events and the limited amount of annotated data, state-of-the-art systems on fake news detection face challenges due to the lack of large numbers of annotated training instances that are hard to come by for early detection. In this work, the authors exploited multiple weak signals from different user engagements. They call this approach multi-source weak social supervision or MWSS.
Amazon's AI uses meta learning to accomplish related tasks
In a paper scheduled to be presented at the upcoming International Conference on Learning Representations, Amazon researchers propose an AI approach that greatly improves performance on certain meta-learning tasks (i.e., tasks that involve both accomplishing related goals and learning how to learn to perform them). They say it can be adapted to new tasks with only a handful of labeled training examples, meaning a large corporation could use it to, for example, extract charts and captions from scanned paperwork. In conventional machine learning, a model trains on a set of labeled data (a support set) and learns to correlate features with the labels. It's then fed a separate set of test data (a query set) and evaluated based on how well it predicts that set's labels. By contrast, during meta learning, an AI model learns to perform tasks with their own sets of training data and test data and the model sees both. In this way, the AI learns how particular ways of responding to the training data affect performance on the test data.
k-Nearest Neighbour Classifiers -- 2nd Edition
Cunningham, Padraig, Delany, Sarah Jane
Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.
Empirical Perspectives on One-Shot Semi-supervised Learning
Smith, Leslie N., Conovaloff, Adam
One of the greatest obstacles in the adoption of deep neural networks for new applications is that training the network typically requires a large number of manually labeled training samples. We empirically investigate the scenario where one has access to large amounts of unlabeled data but require labeling only a single prototypical sample per class in order to train a deep network (i.e., one-shot semi-supervised learning). Specifically, we investigate the recent results reported in FixMatch for one-shot semi-supervised learning to understand the factors that affect and impede high accuracies and reliability for one-shot semi-supervised learning of Cifar-10. For example, we discover that one barrier to one-shot semi-supervised learning for high-performance image classification is the unevenness of class accuracy during the training. These results point to solutions that might enable more widespread adoption of one-shot semi-supervised training methods for new applications.
Global Expanding, Local Shrinking: Discriminant Multi-label Learning with Missing Labels
In multi-label learning, the issue of missing labels brings a major challenge. Many methods attempt to recovery missing labels by exploiting low-rank structure of label matrix. However, these methods just utilize global low-rank label structure, ignore both local low-rank label structures and label discriminant information to some extent, leaving room for further performance improvement. In this paper, we develop a simple yet effective discriminant multi-label learning (DM2L) method for multi-label learning with missing labels. Specifically, we impose the low-rank structures on all the predictions of instances from the same labels (local shrinking of rank), and a maximally separated structure (high-rank structure) on the predictions of instances from different labels (global expanding of rank). In this way, these imposed low-rank structures can help modeling both local and global low-rank label structures, while the imposed high-rank structure can help providing more underlying discriminability. Our subsequent theoretical analysis also supports these intuitions. In addition, we provide a nonlinear extension via using kernel trick to enhance DM2L and establish a concave-convex objective to learn these models. Compared to the other methods, our method involves the fewest assumptions and only one hyper-parameter. Even so, extensive experiments show that our method still outperforms the state-of-the-art methods.
State-Only Imitation Learning for Dexterous Manipulation
Radosavovic, Ilija, Wang, Xiaolong, Pinto, Lerrel, Malik, Jitendra
Dexterous manipulation has been a long-standing challenge in robotics. Recently, modern model-free RL has demonstrated impressive results on a number of problems. However, complex domains like dexterous manipulation remain a challenge for RL due to the poor sample complexity. To address this, current approaches employ expert demonstrations in the form of state-action pairs, which are difficult to obtain for real-world settings such as learning from videos. In this work, we move toward a more realistic setting and explore state-only imitation learning. To tackle this setting, we train an inverse dynamics model and use it to predict actions for state-only demonstrations. The inverse dynamics model and the policy are trained jointly. Our method performs on par with state-action approaches and considerably outperforms RL alone. By not relying on expert actions, we are able to learn from demonstrations with different dynamics, morphologies, and objects.
Feature Partitioning for Robust Tree Ensembles and their Certification in Adversarial Scenarios
Calzavara, Stefano, Lucchese, Claudio, Marcuzzi, Federico, Orlando, Salvatore
Machine learning algorithms, however effective, are known to be vulnerable in adversarial scenarios where a malicious user may inject manipulated instances. In this work we focus on evasion attacks, where a model is trained in a safe environment and exposed to attacks at test time. The attacker aims at finding a minimal perturbation of a test instance that changes the model outcome. We propose a model-agnostic strategy that builds a robust ensemble by training its basic models on feature-based partitions of the given dataset. Our algorithm guarantees that the majority of the models in the ensemble cannot be affected by the attacker. We experimented the proposed strategy on decision tree ensembles, and we also propose an approximate certification method for tree ensembles that efficiently assess the minimal accuracy of a forest on a given dataset avoiding the costly computation of evasion attacks. Experimental evaluation on publicly available datasets shows that proposed strategy outperforms state-of-the-art adversarial learning algorithms against evasion attacks.
Probabilistic Diagnostic Tests for Degradation Problems in Supervised Learning
Valencia-Zapata, Gustavo A., Ersoy, Okan, Gonzalez-Canas, Carolina, Zentner, Michael G., Klimeck, Gerhard
Several studies point out different causes of performance degradation in supervised machine learning. Problems such as class imbalance, overlapping, small-disjuncts, noisy labels, and sparseness limit accuracy in classification algorithms. Even though a number of approaches either in the form of a methodology or an algorithm try to minimize performance degradation, they have been isolated efforts with limited scope. Most of these approaches focus on remediation of one among many problems, with experimental results coming from few datasets and classification algorithms, insufficient measures of prediction power, and lack of statistical validation for testing the real benefit of the proposed approach. This paper consists of two main parts: In the first part, a novel probabilistic diagnostic model based on identifying signs and symptoms of each problem is presented. Thereby, early and correct diagnosis of these problems is to be achieved in order to select not only the most convenient remediation treatment but also unbiased performance metrics. Secondly, the behavior and performance of several supervised algorithms are studied when training sets have such problems. Therefore, prediction of success for treatments can be estimated across classifiers.
Countering Language Drift with Seeded Iterated Learning
Lu, Yuchen, Singhal, Soumye, Strub, Florian, Pietquin, Olivier, Courville, Aaron
Supervised learning methods excel at capturing statistical properties of language when trained over large text corpora. Yet, these models often produce inconsistent outputs in goal-oriented language settings as they are not trained to complete the underlying task. Moreover, as soon as the agents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we propose a generic approach to counter language drift by using iterated learning. We iterate between fine-tuning agents with interactive training steps, and periodically replacing them with new agents that are seeded from last iteration and trained to imitate the latest finetuned models. Iterated learning does not require external syntactic constraint nor semantic knowledge, making it a valuable task-agnostic finetuning protocol. We first explore iterated learning in the Lewis Game. We then scale-up the approach in the translation game. In both settings, our results show that iterated learn-ing drastically counters language drift as well as it improves the task completion metric.