AITopics

arXiv.org Machine LearningJun-17-2021

PAC Prediction Sets Under Covariate Shift

Park, Sangdon, Dobriban, Edgar, Lee, Insup, Bastani, Osbert

An important challenge facing modern machine learning is how to rigorously quantify the uncertainty of model predictions. Conveying uncertainty is especially important when there are changes to the underlying data distribution that might invalidate the predictive model. Yet, most existing uncertainty quantification algorithms break down in the presence of such shifts. We propose a novel approach that addresses this challenge by constructing \emph{probably approximately correct (PAC)} prediction sets in the presence of covariate shift. Our approach focuses on the setting where there is a covariate shift from the source distribution (where we have labeled training examples) to the target distribution (for which we want to quantify uncertainty). Our algorithm assumes given importance weights that encode how the probabilities of the training examples change under the covariate shift. In practice, importance weights typically need to be estimated; thus, we extend our algorithm to the setting where we are given confidence intervals for the importance weights rather than their true value. We demonstrate the effectiveness of our approach on various covariate shifts designed based on the DomainNet and ImageNet datasets.

calibration, prediction, rscp-wiw, (14 more...)

arXiv.org Machine Learning

2106.09848

Country:

North America > United States > Pennsylvania (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry:

Government > Military (1.00)
Transportation (0.93)
Government > Regional Government > North America Government > United States Government (0.93)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceJun-15-2021

Automated Self-Supervised Learning for Graphs

Jin, Wei, Liu, Xiaorui, Zhao, Xiangyu, Ma, Yao, Shah, Neil, Tang, Jiliang

Graph self-supervised learning has gained increasing attention due to its capacity to learn expressive node representations. Many pretext tasks, or loss functions have been designed from distinct perspectives. However, we observe that different pretext tasks affect downstream tasks differently cross datasets, which suggests that searching pretext tasks is crucial for graph self-supervised learning. Different from existing works focusing on designing single pretext tasks, this work aims to investigate how to automatically leverage multiple pretext tasks effectively. Nevertheless, evaluating representations derived from multiple pretext tasks without direct access to ground truth labels makes this problem challenging. To address this obstacle, we make use of a key principle of many real-world graphs, i.e., homophily, or the principle that ``like attracts like,'' as the guidance to effectively search various self-supervised pretext tasks. We provide theoretical understanding and empirical evidence to justify the flexibility of homophily in this search task. Then we propose the AutoSSL framework which can automatically search over combinations of various self-supervised tasks. By evaluating the framework on 7 real-world datasets, our experimental results show that AutoSSL can significantly boost the performance on downstream tasks including node clustering and node classification compared with training under individual tasks. Code will be released at https://github.com/ChandlerBang/AutoSSL.

representation, task weight, uto ssl-es, (12 more...)

2106.0547

Country:

North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Li, Yazhe, Pogodin, Roman, Sutherland, Danica J., Gretton, Arthur

Self-Supervised Learning with Kernel Dependence Maximization

arXiv.org Machine LearningJun-15-2021

We approach self-supervised learning of image representations from a statistical dependence perspective, proposing Self-Supervised Learning with the Hilbert-Schmidt Independence Criterion (SSL-HSIC). SSL-HSIC maximizes dependence between representations of transformed versions of an image and the image identity, while minimizing the kernelized variance of those features. This self-supervised learning framework yields a new understanding of InfoNCE, a variational lower bound on the mutual information (MI) between different transformations. While the MI itself is known to have pathologies which can result in meaningless representations being learned, its bound is much better behaved: we show that it implicitly approximates SSL-HSIC (with a slightly different regularizer). Our approach also gives us insight into BYOL, since SSL-HSIC similarly learns local neighborhoods of samples. SSL-HSIC allows us to directly optimize statistical dependence in time linear in the batch size, without restrictive data assumptions or indirect mutual information estimators. Trained with or without a target network, SSL-HSIC matches the current state-of-the-art for standard linear evaluation on ImageNet [1], semi-supervised learning and transfer to other classification and vision tasks such as semantic segmentation, depth estimation and object recognition.

kernel, representation, ssl-hsic, (14 more...)

arXiv.org Machine Learning

2106.0832

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

RRULES: An improvement of the RULES rule-based classifier

Palliser-Sans, Rafel

RRULES is presented as an improvement and optimization over RULES, a simple inductive learning algorithm for extracting IF-THEN rules from a set of training examples. RRULES optimizes the algorithm by implementing a more effective mechanism to detect irrelevant rules, at the same time that checks the stopping conditions more often. This results in a more compact rule set containing more general rules which prevent overfitting the training set and obtain a higher test accuracy. Moreover, the results show that RRULES outperforms the original algorithm by reducing the coverage rate up to a factor of 7 while running twice or three times faster consistently over several datasets.

algorithm, precision, selector, (15 more...)

2106.07296

Country: Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Examining and Combating Spurious Features under Distribution Shift

Zhou, Chunting, Ma, Xuezhe, Michel, Paul, Neubig, Graham

A central goal of machine learning is to learn robust representations that capture the causal relationship between inputs features and output labels. However, minimizing empirical risk over finite or biased datasets often results in models latching on to spurious correlations between the training input/output pairs that are not fundamental to the problem at hand. In this paper, we define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics. We prove that even when there is only bias of the input distribution (i.e. covariate shift), models can still pick up spurious features from their training data. Group distributionally robust optimization (DRO) provides an effective tool to alleviate covariate shift by minimizing the worst-case training loss over a set of pre-defined groups. Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations that occur in the data. To address this, we further propose to minimize the worst-case losses over a more flexible set of distributions that are defined on the joint distribution of groups and instances, instead of treating each group as a whole at optimization time. Through extensive experiments on one image and two language tasks, we show that our model is significantly more robust than comparable baselines under various partitions. Our code is available at https://github.com/violet-zct/group-conditional-DRO.

examining and combating spurious feature, group dro, partition, (12 more...)

2106.07171

Country:

North America > United States > California (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Self-supervised Dialogue Learning for Spoken Conversational Question Answering

Chen, Nuo, You, Chenyu, Zou, Yuexian

In spoken conversational question answering (SCQA), the answer to the corresponding question is generated by retrieving and then analyzing a fixed spoken document, including multi-part conversations. Most SCQA systems have considered only retrieving information from ordered utterances. However, the sequential order of dialogue is important to build a robust spoken conversational question answering system, and the changes of utterances order may severely result in low-quality and incoherent corpora. To this end, we introduce a self-supervised learning approach, including incoherence discrimination, insertion detection, and question prediction, to explicitly capture the coreference resolution and dialogue coherence among spoken documents. Specifically, we design a joint learning framework where the auxiliary self-supervised tasks can enable the pre-trained SCQA systems towards more coherent and meaningful spoken dialogue learning. We also utilize the proposed self-supervised learning tasks to capture intra-sentence coherence. Experimental results demonstrate that our proposed method provides more coherent, meaningful, and appropriate responses, yielding superior performance gains compared to the original pre-trained language models. Our method achieves state-of-the-art results on the Spoken-CoQA dataset.

arxiv preprint arxiv, dialogue, utterance, (12 more...)

2106.02182

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
North America > United States (0.04)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)

Alghanmi, Israa, Espinosa-Anke, Luis, Schockaert, Steven

Probing Pre-Trained Language Models for Disease Knowledge

Pre-trained language models such as ClinicalBERT have achieved impressive results on tasks such as medical Natural Language Inference. At first glance, this may suggest that these models are able to perform medical reasoning tasks, such as mapping symptoms to diseases. However, we find that standard benchmarks such as MedNLI contain relatively few examples that require such forms of reasoning. To better understand the medical reasoning capabilities of existing language models, in this paper we introduce DisKnE, a new benchmark for Disease Knowledge Evaluation. To construct this benchmark, we annotated each positive MedNLI example with the types of medical reasoning that are needed. We then created negative examples by corrupting these positive examples in an adversarial way. Furthermore, we define training-test splits per disease, ensuring that no knowledge about test diseases can be learned from the training data, and we canonicalize the formulation of the hypotheses to avoid the presence of artefacts. This leads to a number of binary classification problems, one for each type of reasoning and each disease. When analysing pre-trained models for the clinical/biomedical domain on the proposed benchmark, we find that their performance drops considerably.

category, knowledge, proceedings, (13 more...)

2106.07285

Country: Europe > United Kingdom (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (0.88)
Health & Medicine > Therapeutic Area > Endocrinology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.50)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.35)

SAS: Self-Augmented Strategy for Language Model Pre-training

Xu, Yifei, Zhang, Jingqiao, He, Ru, Ge, Liangzhu, Yang, Chao, Yang, Cheng, Wu, Ying Nian

The core of a self-supervised learning method for pre-training language models includes the design of appropriate data augmentation and corresponding pre-training task(s). Most data augmentations in language model pre-training are context-independent. The seminal contextualized augmentation recently proposed by the ELECTRA requires a separate generator, which leads to extra computation cost as well as the challenge in adjusting the capability of its generator relative to that of the other model component(s). We propose a self-augmented strategy (SAS) that uses a single forward pass through the model to augment the input data for model training in the next epoch. Essentially our strategy eliminates a separate generator network and uses only one network to generate the data augmentation and undertake two pre-training tasks (the MLM task and the RTD task) jointly, which naturally avoids the challenge in adjusting the generator's capability as well as reduces the computation cost. Additionally, our SAS is a general strategy such that it can seamlessly incorporate many new techniques emerging recently or in the future, such as the disentangled attention mechanism recently proposed by the DeBERTa model. Our experiments show that our SAS is able to outperform the ELECTRA and other state-of-the-art models in the GLUE tasks with the same or less computation cost.

electra-small, sas, score 8, (12 more...)

2106.07176

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)

Karim, Mohammed Asad, Verma, Vinay Kumar, Singh, Pravendra, Namboodiri, Vinay, Rai, Piyush

Knowledge Consolidation based Class Incremental Online Learning with Limited Data

arXiv.org Artificial IntelligenceJun-12-2021

This problem setting which necessitates a class incremental learning approach; is more challenging than standard class incremental learning (2) Data for each class is given in an online [Javed and White, 2019] due to additional constraints: (1) fashion, i.e., each training example is seen only once Data in each class appears in the online fashion, i.e., the model during training; (3) Each class has very few training sees every training example exactly once; (2) The number of examples; and (4) We do not use or assume access training examples in each class is very small; and (3) We do to any replay/memory to store data from previous not use any replay/memory to store the training examples from classes. Therefore, in this setting, we have to handle previous classes. This is the most general setting for class incremental twofold problems of catastrophic forgetting and learning and various practical usage scenario can be overfitting. In our approach, we learn robust representations obtained through this or a relaxed setting. For instance, in face that are generalizable across tasks without recognition, it is common to have few examples per class but suffering from the problems of catastrophic forgetting usually not in an online learning fashion, whereas for a robot and overfitting to accommodate future classes navigating in an environment, the setting would also be online.

learning, representation, trajectory, (14 more...)

2106.06795

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)
Europe > United Kingdom > England > Somerset > Bath (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)