to

Training on test inputs with amortized conditional normalized maximum likelihood

Current machine learning methods provide unprecedented accuracy across a range of domains, from computer vision to natural language processing. However, in many important high-stakes applications, such as medical diagnosis or autonomous driving, rare mistakes can be extremely costly, and thus effective deployment of learned models requires not only high accuracy, but also a way to measure the certainty in a model's predictions. Reliable uncertainty quantification is especially important when faced with out-of-distribution inputs, as model accuracy tends to degrade heavily on inputs that differ significantly from those seen during training. In this blog post, we will discuss how we can get reliable uncertainty estimation with a strategy that does not simply rely on a learned model to extrapolate to out-of-distribution inputs, but instead asks: "given my training data, which labels would make sense for this input?". To illustrate how this can allow for more reasonable predictions on out-of-distribution data, consider the following example where we attempt to classify automobiles, where all the class 1 training examples are sedans and class 2 examples are large buses.

Concept Drift Monitoring and Diagnostics of Supervised Learning Models via Score Vectors

Supervised learning models are one of the most fundamental classes of models. Viewing supervised learning from a probabilistic perspective, the set of training data to which the model is fitted is usually assumed to follow a stationary distribution. However, this stationarity assumption is often violated in a phenomenon called concept drift, which refers to changes over time in the predictive relationship between covariates $\mathbf{X}$ and a response variable $Y$ and can render trained models suboptimal or obsolete. We develop a comprehensive and computationally efficient framework for detecting, monitoring, and diagnosing concept drift. Specifically, we monitor the Fisher score vector, defined as the gradient of the log-likelihood for the fitted model, using a form of multivariate exponentially weighted moving average, which monitors for general changes in the mean of a random vector. In spite of the substantial performance advantages that we demonstrate over popular error-based methods, a score-based approach has not been previously considered for concept drift monitoring. Advantages of the proposed score-based framework include applicability to any parametric model, more powerful detection of changes as shown in theory and experiments, and inherent diagnostic capabilities for helping to identify the nature of the changes.

NeurIPS 2020

Back in February, when AI conferences were still held in-person, Turing Award winners Geoffrey Hinton, Yann LeCun and Yoshua Bengio shared a stage in New York at an AAAI event, which Syncedcovered in detail. LeCun told the audience that, after decades of skepticism, he had finally joined Hinton in support of the idea that self-supervised learning may usher in AI's next revolution. Unlike supervised learning, which requires manual data-labelling, self-supervised learning (SSL) is an approach that can automatically generate labels. Recent improvements in self-supervised training methods have established SSL as a serious alternative to traditional supervised training. Google's language representation model ALBERT for example utilizes a self-supervised training framework to leverage large amounts of text. It's no surprise then that NeurIPS 2020 (the Conference on Neural Information Processing Systems) would find itself at the forefront of this trend.

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

We propose TabTransformer, a novel deep tabular data modeling architecture for supervised and semi-supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Through extensive experiments on fifteen publicly available datasets, we show that the TabTransformer outperforms the state-of-the-art deep learning methods for tabular data by at least 1.0% on mean AUC, and matches the performance of tree-based ensemble models. Furthermore, we demonstrate that the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features, and provide better interpretability. Lastly, for the semi-supervised setting we develop an unsupervised pre-training procedure to learn data-driven contextual embeddings, resulting in an average 2.1% AUC lift over the state-of-the-art methods.

GNN-XML: Graph Neural Networks for Extreme Multi-label Text Classification

Extreme multi-label text classification (XMTC) aims to tag a text instance with the most relevant subset of labels from an extremely large label set. XMTC has attracted much recent attention due to massive label sets yielded by modern applications, such as news annotation and product recommendation. The main challenges of XMTC are the data scalability and sparsity, thereby leading to two issues: i) the intractability to scale to the extreme label setting, ii) the presence of long-tailed label distribution, implying that a large fraction of labels have few positive training instances. To overcome these problems, we propose GNN-XML, a scalable graph neural network framework tailored for XMTC problems. Specifically, we exploit label correlations via mining their co-occurrence patterns and build a label graph based on the correlation matrix. We then conduct the attributed graph clustering by performing graph convolution with a low-pass graph filter to jointly model label dependencies and label features, which induces semantic label clusters. We further propose a bilateral-branch graph isomorphism network to decouple representation learning and classifier learning for better modeling tail labels. Experimental results on multiple benchmark datasets show that GNN-XML significantly outperforms state-of-the-art methods while maintaining comparable prediction efficiency and model size.

Flexible Few-Shot Learning with Contextual Similarity

Existing approaches to few-shot learning deal with tasks that have persistent, rigid notions of classes. Typically, the learner observes data only from a fixed number of classes at training time and is asked to generalize to a new set of classes at test time. Two examples from the same class would always be assigned the same labels in any episode. In this work, we consider a realistic setting where the similarities between examples can change from episode to episode depending on the task context, which is not given to the learner. We define new benchmark datasets for this flexible few-shot scenario, where the tasks are based on images of faces (Celeb-A), shoes (Zappos50K), and general objects (ImageNet-with-Attributes). While classification baselines and episodic approaches learn representations that work well for standard few-shot learning, they suffer in our flexible tasks as novel similarity definitions arise during testing. We propose to build upon recent contrastive unsupervised learning techniques and use a combination of instance and class invariance learning, aiming to obtain general and flexible features. We find that our approach performs strongly on our new flexible few-shot learning benchmarks, demonstrating that unsupervised learning obtains more generalizable representations.

Japan COVID-19 cases set single-day record

Japan shattered the nationwide record for COVID-19 cases in a day on Wednesday, registering 2,746 cases of the deadly virus as of 6 p.m., public broadcaster NHK reported. The day saw a spate of records in several prefectures, including 245 cases in Aichi, 75 in Kyoto, 72 in Hiroshima, 49 in Gunma and 21 in Oita, and comes amid a recent surge in infections that have prompted concern among regional governments and health authorities. Tokyo, meanwhile, reported 572 new cases -- the second highest daily total ever -- while the number of serious cases dipped by one from a day earlier to 59. The news came a day after the nationwide death toll hit a single-day record of 47 and as serious cases also hit an all-time daily high of 536, according to the health ministry. The capital's daily figure on Wednesday, which was just shy of the record 584 cases recorded last Saturday, was based on 1,428 tests, the Tokyo Metropolitan Government said in a statement.

A PAC-Bayesian Perspective on Structured Prediction with Implicit Loss Embeddings

Many practical machine learning tasks can be framed as Structured prediction problems, where several output variables are predicted and considered interdependent. Recent theoretical advances in structured prediction have focused on obtaining fast rates convergence guarantees, especially in the Implicit Loss Embedding (ILE) framework. PAC-Bayes has gained interest recently for its capacity of producing tight risk bounds for predictor distributions. This work proposes a novel PAC-Bayes perspective on the ILE Structured prediction framework. We present two generalization bounds, on the risk and excess risk, which yield insights into the behavior of ILE predictors. Two learning algorithms are derived from these bounds.

Semantics Altering Modifications for Evaluating Comprehension in Machine Reading

Advances in NLP have yielded impressive results for the task of machine reading comprehension (MRC), with approaches having been reported to achieve performance comparable to that of humans. In this paper, we investigate whether state-of-the-art MRC models are able to correctly process Semantics Altering Modifications (SAM): linguistically-motivated phenomena that alter the semantics of a sentence while preserving most of its lexical surface form. We present a method to automatically generate and align challenge sets featuring original and altered examples. We further propose a novel evaluation methodology to correctly assess the capability of MRC systems to process these examples independent of the data they were optimised on, by discounting for effects introduced by domain shift. In a large-scale empirical study, we apply the methodology in order to evaluate extractive MRC models with regard to their capability to correctly process SAM-enriched data. We comprehensively cover 12 different state-of-the-art neural architecture configurations and four training datasets and find that -- despite their well-known remarkable performance -- optimised models consistently struggle to correctly process semantically altered data.

Art Style Classification with Self-Trained Ensemble of AutoEncoding Transformations

The artistic style of a painting is a rich descriptor that reveals both visual and deep intrinsic knowledge about how an artist uniquely portrays and expresses their creative vision. Accurate categorization of paintings across different artistic movements and styles is critical for large-scale indexing of art databases. However, the automatic extraction and recognition of these highly dense artistic features has received little to no attention in the field of computer vision research. In this paper, we investigate the use of deep self-supervised learning methods to solve the problem of recognizing complex artistic styles with high intra-class and low inter-class variation. Further, we outperform existing approaches by almost 20% on a highly class imbalanced WikiArt dataset with 27 art categories. To achieve this, we train the EnAET semi-supervised learning model (Wang et al., 2019) with limited annotated data samples and supplement it with self-supervised representations learned from an ensemble of spatial and non-spatial transformations.