Inductive Learning
Using Meta-Neurons to learn facts from a single training example
Human learning comes in two forms, a fast and a slow one. The slow one requires a lot of repetition which seems to be necessary to conquer a new cognitive field such as learning a new language. But once a field is mastered, learning new facts within this field requires very few, possibly even only one example. It appears, that the brain regions involved in processing this field have been pre wired to the regions they depend on. So once a new fact needs to be learned, this pre wiring is used to speed up the training of the neurons involved in processing this new fact.
Counterfactual Story Reasoning and Generation
Qin, Lianhui, Bosselut, Antoine, Holtzman, Ari, Bhagavatula, Chandra, Clark, Elizabeth, Choi, Yejin
Counterfactual reasoning requires predicting how alternative events, contrary to what actually happened, might have resulted in different outcomes. Despite being considered a necessary component of AI-complete systems, few resources have been developed for evaluating counterfactual reasoning in narratives. In this paper, we propose Counterfactual Story Rewriting: given an original story and an intervening counterfactual event, the task is to minimally revise the story to make it compatible with the given counterfactual event. Solving this task will require deep understanding of causal narrative chains and counterfactual invariance, and integration of such story reasoning capabilities into conditional language generation models. We present TimeTravel, a new dataset of 29,849 counterfactual rewritings, each with the original story, a counterfactual event, and human-generated revision of the original story compatible with the counterfactual event. Additionally, we include 80,115 counterfactual "branches" without a rewritten storyline to support future work on semi- or un-supervised approaches to counterfactual story rewriting. Finally, we evaluate the counterfactual rewriting capacities of several competitive baselines based on pretrained language models, and assess whether common overlap and model-based automatic metrics for text generation correlate well with human scores for counterfactual rewriting.
A Logic-Driven Framework for Consistency of Neural Models
Li, Tao, Gupta, Vivek, Mehta, Maitrey, Srikumar, Vivek
Consequently, we have seen progressively improving performances on benchmarks such as GLUE (Wang et al., 2018). But, are models really becoming better? We take the position that, while tracking performance on a leaderboard is necessary to characterize model quality, it is not sufficient. Reasoning about language requires that a system has the ability not only to draw correct inferences about textual inputs, but also to be consistent its beliefs across various inputs. To illustrate this notion of consistency, let us consider the task of natural language inference (NLI) which seeks to identify whether a premise entails, contradicts or is unrelated to a hypothesis (Dagan et al., 2013).
How to Apply Self-Supervision to Tabular Data: Introducing dfencoder
Unsupervised learning is an old and well-understood problem in machine learning; LeCun's choice to replace it as the star in his cake analogy is not something he should take lightly! If you dive into the definition of self-supervised learning, you'll begin to see that it's really just an approach to unsupervised learning. Since many of the breakthroughs in machine learning this decade have been based on supervised learning techniques, successes in unsupervised problems tend to emerge when researchers re-frame an unsupervised problem as a supervised problem. Specifically, in self-supervised learning, we find a clever way to generate labels without human annotators. An easy example is a technique called next-step prediction.
A Discrete Hard EM Approach for Weakly Supervised Question Answering
Min, Sewon, Chen, Danqi, Hajishirzi, Hannaneh, Zettlemoyer, Luke
Many question answering (QA) tasks only provide weak supervision for how the answer should be computed. For example, TriviaQA answers are entities that can be mentioned multiple times in supporting documents, while DROP answers can be computed by deriving many different equations from numbers in the reference text. In this paper, we show it is possible to convert such tasks into discrete latent variable learning problems with a precomputed, task-specific set of possible "solutions" (e.g. different mentions or equations) that contains one correct option. We then develop a hard EM learning scheme that computes gradients relative to the most likely solution at each update. Despite its simplicity, we show that this approach significantly outperforms previous methods on six QA tasks, including absolute gains of 2--10%, and achieves the state-of-the-art on five of them. Using hard updates instead of maximizing marginal likelihood is key to these results as it encourages the model to find the one correct answer, which we show through detailed qualitative analysis.
Machine Learning โ Introduction to Supervised Learning Vinod Sharma's Blog
Supervised learning โ A blessing we have in this machines era. It helps to depict inputs to outputs. It uses labelled training data to deduce a function which has a set of training examples. The majority of practical machine learning uses supervised learning as on date. AILabPage defines Machine Learning as "A focal point where business, data and experience meets emerging technology and decides to work together".
Applications of Zero-Shot Learning
As a member of a research group involved in computer vision, I wanted to write this short article to briefly present what we call "Zero-shot learning" (ZSL), an interesting variant of transfer learning, and the current research related to it. Today, many machine learning methods focus on classifying instances whose classes have already been seen in training. Concretely, many applications require classifying instances whose classes have not been seen before. Zero-shot learning is a promising learning method, in which the classes covered by training instances and the classes we aim to classify are disjoint. In other words, Zero-shot learning is about leveraging supervised learning with no additional training data.
CUDA: Contradistinguisher for Unsupervised Domain Adaptation
Balgi, Sourabh, Dukkipati, Ambedkar
--Humans are very sophisticated in learning new information on a completely unknown domain because humans can contradistinguish, i.e., distinguish by contrasting qualities. We learn on a new unknown domain by jointly using unsupervised information directly from unknown domain and supervised information previously acquired knowledge from some other domain. Motivated by this supervised-unsupervised joint learning, we propose a simple model referred as Contradis-tinguisher (CTDR) for unsupervised domain adaptation whose objective is to jointly learn to contradistinguish on unlabeled target domain in a fully unsupervised manner along with prior knowledge acquired by supervised learning on an entirely different domain. Most recent works in domain adaptation rely on an indirect way of first aligning the source and target domain distributions and then learn a classifier on labeled source domain to classify target domain. This approach of indirect way of addressing the real task of unlabeled target domain classification has three main drawbacks. In this work, we propose a simple and direct approach that does not require domain alignment. We jointly learn CTDR on both source and target distribution for unsupervised domain adaptation task using contradistinguish loss for the unlabeled target domain in conjunction with supervised loss for labeled source domain. Our experiments show that avoiding domain alignment by directly addressing the task of unlabeled target domain classification using CTDR achieves state-of-the-art results on eight visual and four language benchmark domain adaptation datasets.
Neural Structured Learning TensorFlow
Neural Structured Learning (NSL) is a new learning paradigm to train neural networks by leveraging structured signals in addition to feature inputs. Structure can be explicit as represented by a graph or implicit as induced by adversarial perturbation. Structured signals are commonly used to represent relations or similarity among samples that may be labeled or unlabeled. Therefore, leveraging these signals during neural network training harnesses both labeled and unlabeled data, which can improve model accuracy, particularly when the amount of labeled data is relatively small. Additionally, models trained with samples that are generated by adding adversarial perturbation have been shown to be robust against malicious attacks, which are designed to mislead a model's prediction or classification.