Goto

Collaborating Authors

 Inductive Learning


A Survey of Knowledge Enhanced Pre-trained Models

arXiv.org Artificial Intelligence

Pre-trained models learn contextualized word representations on large-scale text corpus through a self-supervised learning method, which has achieved promising performance after fine-tuning. These models, however, suffer from poor robustness and lack of interpretability. Pre-trained models with knowledge injection, which we call knowledge enhanced pre-trained models (KEPTMs), possess deep understanding and logical reasoning and introduce interpretability to some extent. In this survey, we provide a comprehensive overview of KEPTMs for natural language processing. We first introduce the progress of pre-trained models and knowledge representation learning. Then we systematically categorize existing KEPTMs from three different perspectives. Finally, we outline some potential directions of KEPTMs for future research.


Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

arXiv.org Artificial Intelligence

Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions. However, existing neural models have been shown to lack this basic ability in learning symbolic structures. Motivated by the failure of a Transformer model on the SCAN compositionality challenge (Lake and Baroni, 2018), which requires parsing a command into actions, we propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics, as additional training supervision. These automatically-generated sequences are more representative of the underlying compositional symbolic structures of the input data. During inference, the model jointly predicts the next action and the next tokens in the auxiliary sequences at each step. Experiments on the SCAN dataset show that our method encourages the Transformer to understand compositional structures of the command, improving its accuracy on multiple challenging splits from <= 10% to 100%. With only 418 (5%) training instances, our approach still achieves 97.8% accuracy on the MCD1 split. Therefore, we argue that compositionality can be induced in Transformers given minimal but proper guidance. We also show that a better result is achieved using less contextualized vectors as the attention's query, providing insights into architecture choices in achieving systematic compositionality. Finally, we show positive generalization results on the groundedSCAN task (Ruis et al., 2020). Our code is publicly available at: https://github.com/jiangycTarheel/compositional-auxseq


Focused Contrastive Training for Test-based Constituency Analysis

arXiv.org Artificial Intelligence

We propose a scheme for self-training of grammaticality models for constituency analysis based on linguistic tests. A pre-trained language model is fine-tuned by contrastive estimation of grammatical sentences from a corpus, and ungrammatical sentences that were perturbed by a syntactic test, a transformation that is motivated by constituency theory. We show that consistent gains can be achieved if only certain positive instances are chosen for training, depending on whether they could be the result of a test transformation. This way, the positives, and negatives exhibit similar characteristics, which makes the objective more challenging for the language model, and also allows for additional markup that indicates the position of the test application within the sentence.


How to do Deep Learning on Graphs with Graph Convolutional Networks

#artificialintelligence

In the previous post, I gave a high-level introduction to GCNs and showed how a nodes representation is updated based on its neighbors representation. In this post, we first gain a deeper understanding of the aggregation performed during the rather simple graph convolutions discussed in the previous post. Then we move on to a recently published graph convolutional propagation rule and I show how to implement and use it for semi-supervised learning on a community prediction task in Zachary's Karate Club, a small social network. As shown below, the GCN is able to learn latent feature representations for each node that separates the two communities into two reasonably cohesive and separated clusters despite using only one training example for each community.


Semi-supervised learning made simple

#artificialintelligence

Semi-supervised learning is a machine learning technique of deriving useful information from both labelled and unlabelled data. Before doing this tutorial, you should have basic familiarity with supervised learning on images with PyTorch. We will omit reinforcement learning here and concentrate on the first two types. In supervised learning, our data consists of labelled objects. A machine learning model is tasked with learning how to assign labels (or values) to objects.


RAFT: A Real-World Few-Shot Text Classification Benchmark

arXiv.org Artificial Intelligence

Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants? Existing benchmarks are not designed to measure progress in applied settings, and so don't directly answer this question. The RAFT benchmark (Real-world Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment. Baseline evaluations on RAFT reveal areas current techniques struggle with: reasoning over long texts and tasks with many classes. Human baselines show that some classification tasks are difficult for non-expert humans, reflecting that real-world value sometimes depends on domain expertise. Yet even non-expert human baseline F1 scores exceed GPT-3 by an average of 0.11. The RAFT datasets and leaderboard will track which model improvements translate into real-world benefits at https://raft.elicit.org .


Machine learning: Types (part-3)

#artificialintelligence

Inference refers to reaching an outcome or decision. There are different paradigms for inference that may be used as a framework for understanding how some machine learning algorithms work or how some learning problems may be approached. Some examples of approaches to learning are inductive, deductive, and transductive learning and inference. Inductive learning involves using evidence to determine the outcome. Inductive reasoning refers to using specific cases to determine general outcomes, ex- specific to general.


AdaBoost

#artificialintelligence

Boosting refers to any Ensemble method that can combine several weak learners into a strong learner. The general idea of most boosting methods is to train predictors sequentially, each trying to correct its predecessor. There are many boosting methods available, one of the most popular is AdaBoost (Adaptive Boosting). The way for a new predictor to correct its predecessor is to pay a bit more attention to the training instances that the predecessor underfitted. This is the technique used by AdaBoost.


Data, Assemble: Leveraging Multiple Datasets with Heterogeneous and Partial Labels

arXiv.org Artificial Intelligence

The success of deep learning relies heavily on large datasets with extensive labels, but we often only have access to several small, heterogeneous datasets associated with partial labels, particularly in the field of medical imaging. When learning from multiple datasets, existing challenges include incomparable, heterogeneous, or even conflicting labeling protocols across datasets. In this paper, we propose a new initiative--"data, assemble"--which aims to unleash the full potential of partially labeled data and enormous unlabeled data from an assembly of datasets. To accommodate the supervised learning paradigm to partial labels, we introduce a dynamic adapter that encodes multiple visual tasks and aggregates image features in a question-and-answer manner. Furthermore, we employ pseudo-labeling and consistency constraints to harness images with missing labels and to mitigate the domain gap across datasets. From proof-of-concept studies on three natural imaging datasets and rigorous evaluations on two large-scale thorax X-ray benchmarks, we discover that learning from "negative examples" facilitates both classification and segmentation of classes of interest. This sheds new light on the computer-aided diagnosis of rare diseases and emerging pandemics, wherein "positive examples" are hard to collect, yet "negative examples" are relatively easier to assemble. As a result, besides exceeding the prior art in the NIH ChestXray benchmark, our model is particularly strong in identifying diseases of minority classes, yielding over 3-point improvement on average. Remarkably, when using existing partial labels, our model performance is on-par (p>0.05) with that using a fully curated dataset with exhaustive labels, eliminating the need for additional 40% annotation costs.


Improved pathogenicity prediction for rare human missense variants

#artificialintelligence

The success of personalized genomic medicine depends on our ability to assess the pathogenicity of rare human variants, including the important class of missense variation. There are many challenges in training accurate computational systems, e.g., in finding the balance between quantity, quality, and bias in the variant sets used as training examples and avoiding predictive features that can accentuate the effects of bias. Here, we describe VARITY, which judiciously exploits a larger reservoir of training examples with uncertain accuracy and representativity. To limit circularity and bias, VARITY excludes features informed by variant annotation and protein identity. To provide a rationale for each prediction, we quantified the contribution of features and feature combinations to the pathogenicity inference of each variant.