Goto

Collaborating Authors

 Inductive Learning


Self-supervised self-supervision by combining deep learning and probabilistic logic

arXiv.org Machine Learning

Labeling training examples at scale is a perennial challenge in machine learning. Self-supervision methods compensate for the lack of direct supervision by leveraging prior knowledge to automatically generate noisy labeled examples. Deep probabilistic logic (DPL) is a unifying framework for self-supervised learning that represents unknown labels as latent variables and incorporates diverse self-supervision using probabilistic logic to train a deep neural network end-to-end using variational EM. While DPL is successful at combining pre-specified self-supervision, manually crafting self-supervision to attain high accuracy may still be tedious and challenging. In this paper, we propose Self-Supervised Self-Supervision (S4), which adds to DPL the capability to learn new self-supervision automatically. Starting from an initial "seed," S4 iteratively uses the deep neural network to propose new self supervision. These are either added directly (a form of structured self-training) or verified by a human expert (as in feature-based active learning). Experiments show that S4 is able to automatically propose accurate self-supervision and can often nearly match the accuracy of supervised methods with a tiny fraction of the human effort.


Training on test inputs with amortized conditional normalized maximum likelihood

AIHub

Current machine learning methods provide unprecedented accuracy across a range of domains, from computer vision to natural language processing. However, in many important high-stakes applications, such as medical diagnosis or autonomous driving, rare mistakes can be extremely costly, and thus effective deployment of learned models requires not only high accuracy, but also a way to measure the certainty in a model's predictions. Reliable uncertainty quantification is especially important when faced with out-of-distribution inputs, as model accuracy tends to degrade heavily on inputs that differ significantly from those seen during training. In this blog post, we will discuss how we can get reliable uncertainty estimation with a strategy that does not simply rely on a learned model to extrapolate to out-of-distribution inputs, but instead asks: "given my training data, which labels would make sense for this input?". To illustrate how this can allow for more reasonable predictions on out-of-distribution data, consider the following example where we attempt to classify automobiles, where all the class 1 training examples are sedans and class 2 examples are large buses.


Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases

arXiv.org Artificial Intelligence

Existing studies on question answering on knowledge bases (KBQA) mainly operate with the standard i.i.d assumption, i.e., training distribution over questions is the same as the test distribution. However, i.i.d may be neither reasonably achievable nor desirable on large-scale KBs because 1) true user distribution is hard to capture and 2) randomly sample training examples from the enormous space would be highly data-inefficient. Instead, we suggest that KBQA models should have three levels of built-in generalization: i.i.d, compositional, and zero-shot. To facilitate the development of KBQA models with stronger generalization, we construct and release a new large-scale, high-quality dataset with 64,331 questions, GrailQA, and provide evaluation settings for all three levels of generalization. In addition, we propose a novel BERT-based KBQA model. The combination of our dataset and model enables us to thoroughly examine and demonstrate, for the first time, the key role of pre-trained contextual embeddings like BERT in the generalization of KBQA.


Concept Drift Monitoring and Diagnostics of Supervised Learning Models via Score Vectors

arXiv.org Machine Learning

Supervised learning models are one of the most fundamental classes of models. Viewing supervised learning from a probabilistic perspective, the set of training data to which the model is fitted is usually assumed to follow a stationary distribution. However, this stationarity assumption is often violated in a phenomenon called concept drift, which refers to changes over time in the predictive relationship between covariates $\mathbf{X}$ and a response variable $Y$ and can render trained models suboptimal or obsolete. We develop a comprehensive and computationally efficient framework for detecting, monitoring, and diagnosing concept drift. Specifically, we monitor the Fisher score vector, defined as the gradient of the log-likelihood for the fitted model, using a form of multivariate exponentially weighted moving average, which monitors for general changes in the mean of a random vector. In spite of the substantial performance advantages that we demonstrate over popular error-based methods, a score-based approach has not been previously considered for concept drift monitoring. Advantages of the proposed score-based framework include applicability to any parametric model, more powerful detection of changes as shown in theory and experiments, and inherent diagnostic capabilities for helping to identify the nature of the changes.


NeurIPS 2020

#artificialintelligence

Back in February, when AI conferences were still held in-person, Turing Award winners Geoffrey Hinton, Yann LeCun and Yoshua Bengio shared a stage in New York at an AAAI event, which Syncedcovered in detail. LeCun told the audience that, after decades of skepticism, he had finally joined Hinton in support of the idea that self-supervised learning may usher in AI's next revolution. Unlike supervised learning, which requires manual data-labelling, self-supervised learning (SSL) is an approach that can automatically generate labels. Recent improvements in self-supervised training methods have established SSL as a serious alternative to traditional supervised training. Google's language representation model ALBERT for example utilizes a self-supervised training framework to leverage large amounts of text. It's no surprise then that NeurIPS 2020 (the Conference on Neural Information Processing Systems) would find itself at the forefront of this trend.


TabTransformer: Tabular Data Modeling Using Contextual Embeddings

arXiv.org Artificial Intelligence

We propose TabTransformer, a novel deep tabular data modeling architecture for supervised and semi-supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Through extensive experiments on fifteen publicly available datasets, we show that the TabTransformer outperforms the state-of-the-art deep learning methods for tabular data by at least 1.0% on mean AUC, and matches the performance of tree-based ensemble models. Furthermore, we demonstrate that the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features, and provide better interpretability. Lastly, for the semi-supervised setting we develop an unsupervised pre-training procedure to learn data-driven contextual embeddings, resulting in an average 2.1% AUC lift over the state-of-the-art methods.


GNN-XML: Graph Neural Networks for Extreme Multi-label Text Classification

arXiv.org Artificial Intelligence

Extreme multi-label text classification (XMTC) aims to tag a text instance with the most relevant subset of labels from an extremely large label set. XMTC has attracted much recent attention due to massive label sets yielded by modern applications, such as news annotation and product recommendation. The main challenges of XMTC are the data scalability and sparsity, thereby leading to two issues: i) the intractability to scale to the extreme label setting, ii) the presence of long-tailed label distribution, implying that a large fraction of labels have few positive training instances. To overcome these problems, we propose GNN-XML, a scalable graph neural network framework tailored for XMTC problems. Specifically, we exploit label correlations via mining their co-occurrence patterns and build a label graph based on the correlation matrix. We then conduct the attributed graph clustering by performing graph convolution with a low-pass graph filter to jointly model label dependencies and label features, which induces semantic label clusters. We further propose a bilateral-branch graph isomorphism network to decouple representation learning and classifier learning for better modeling tail labels. Experimental results on multiple benchmark datasets show that GNN-XML significantly outperforms state-of-the-art methods while maintaining comparable prediction efficiency and model size.


Flexible Few-Shot Learning with Contextual Similarity

arXiv.org Machine Learning

Existing approaches to few-shot learning deal with tasks that have persistent, rigid notions of classes. Typically, the learner observes data only from a fixed number of classes at training time and is asked to generalize to a new set of classes at test time. Two examples from the same class would always be assigned the same labels in any episode. In this work, we consider a realistic setting where the similarities between examples can change from episode to episode depending on the task context, which is not given to the learner. We define new benchmark datasets for this flexible few-shot scenario, where the tasks are based on images of faces (Celeb-A), shoes (Zappos50K), and general objects (ImageNet-with-Attributes). While classification baselines and episodic approaches learn representations that work well for standard few-shot learning, they suffer in our flexible tasks as novel similarity definitions arise during testing. We propose to build upon recent contrastive unsupervised learning techniques and use a combination of instance and class invariance learning, aiming to obtain general and flexible features. We find that our approach performs strongly on our new flexible few-shot learning benchmarks, demonstrating that unsupervised learning obtains more generalizable representations.


Japan COVID-19 cases set single-day record

The Japan Times

Japan shattered the nationwide record for COVID-19 cases in a day on Wednesday, registering 2,746 cases of the deadly virus as of 6 p.m., public broadcaster NHK reported. The day saw a spate of records in several prefectures, including 245 cases in Aichi, 75 in Kyoto, 72 in Hiroshima, 49 in Gunma and 21 in Oita, and comes amid a recent surge in infections that have prompted concern among regional governments and health authorities. Tokyo, meanwhile, reported 572 new cases -- the second highest daily total ever -- while the number of serious cases dipped by one from a day earlier to 59. The news came a day after the nationwide death toll hit a single-day record of 47 and as serious cases also hit an all-time daily high of 536, according to the health ministry. The capital's daily figure on Wednesday, which was just shy of the record 584 cases recorded last Saturday, was based on 1,428 tests, the Tokyo Metropolitan Government said in a statement.


Semantics Altering Modifications for Evaluating Comprehension in Machine Reading

arXiv.org Artificial Intelligence

Advances in NLP have yielded impressive results for the task of machine reading comprehension (MRC), with approaches having been reported to achieve performance comparable to that of humans. In this paper, we investigate whether state-of-the-art MRC models are able to correctly process Semantics Altering Modifications (SAM): linguistically-motivated phenomena that alter the semantics of a sentence while preserving most of its lexical surface form. We present a method to automatically generate and align challenge sets featuring original and altered examples. We further propose a novel evaluation methodology to correctly assess the capability of MRC systems to process these examples independent of the data they were optimised on, by discounting for effects introduced by domain shift. In a large-scale empirical study, we apply the methodology in order to evaluate extractive MRC models with regard to their capability to correctly process SAM-enriched data. We comprehensively cover 12 different state-of-the-art neural architecture configurations and four training datasets and find that -- despite their well-known remarkable performance -- optimised models consistently struggle to correctly process semantically altered data.