AITopics

Self-supervised pre-trained models extract general-purpose representations from data, and quantifying how reliable they are is crucial because many downstream models use these representations as input for their own tasks. To this end, we first introduce a formal definition of representation reliability: the representation for a given test input is considered to be reliable if the downstream models built on top of that representation can consistently generate accurate predictions for that test point. It is desired to estimate the representation reliability without knowing the downstream tasks a priori. We provide a negative result showing that existing frameworks for uncertainty quantification in supervised learning are not suitable for this purpose. As an alternative, we propose an ensemble-based method for quantifying representation reliability, based on the concept of neighborhood consistency in the representation spaces across various pre-trained models. More specifically, the key insight is to use shared neighboring points as anchors to align different representation spaces. We demonstrate through comprehensive numerical experiments that our method is capable of predicting representation reliability with high accuracy.

data mining, machine learning, natural language, (19 more...)

2306.00206

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)
(4 more...)

Venturini, Sara, Cristofari, Andrea, Rinaldi, Francesco, Tudisco, Francesco

Learning the Right Layers: a Data-Driven Layer-Aggregation Strategy for Semi-Supervised Learning on Multilayer Graphs

Clustering (or community detection) on multilayer graphs poses several additional complications with respect to standard graphs as different layers may be characterized by different structures and types of information. One of the major challenges is to establish the extent to which each layer contributes to the cluster assignment in order to effectively take advantage of the multilayer structure and improve upon the classification obtained using the individual layers or their union. However, making an informed a-priori assessment about the clustering information content of the layers can be very complicated. In this work, we assume a semi-supervised learning setting, where the class of a small percentage of nodes is initially provided, and we propose a parameter-free Laplacian-regularized model that learns an optimal nonlinear combination of the different layers from the available input labels. The learning algorithm is based on a Frank-Wolfe optimization scheme with inexact gradient, combined with a modified Label Propagation iteration. We provide a detailed convergence analysis of the algorithm and extensive experiments on synthetic and real-world datasets, showing that the proposed method compares favourably with a variety of baselines and outperforms each individual layer when used in isolation.

artificial intelligence, inductive learning, machine learning, (18 more...)

2306.00152

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Abruzzo > L'Aquila Province > L'Aquila (0.04)
Asia (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.70)

Gupta, Ashim, Krishna, Amrith

Adversarial Clean Label Backdoor Attacks and Defenses on Text Classification Systems

Clean-label (CL) attack is a form of data poisoning attack where an adversary modifies only the textual input of the training data, without requiring access to the labeling function. CL attacks are relatively unexplored in NLP, as compared to label flipping (LF) attacks, where the latter additionally requires access to the labeling function as well. While CL attacks are more resilient to data sanitization and manual relabeling methods than LF attacks, they often demand as high as ten times the poisoning budget than LF attacks. In this work, we first introduce an Adversarial Clean Label attack which can adversarially perturb in-class training examples for poisoning the training set. We then show that an adversary can significantly bring down the data requirements for a CL attack, using the aforementioned approach, to as low as 20% of the data otherwise required. We then systematically benchmark and analyze a number of defense methods, for both LF and CL attacks, some previously employed solely for LF attacks in the textual domain and others adapted from computer vision. We find that text-specific defenses greatly vary in their effectiveness depending on their properties.

machine learning, natural language, text classification, (21 more...)

2305.19607

Country:

North America > United States > Utah (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.51)

RARR: Researching and Revising What Language Models Say, Using Language Models

Gao, Luyu, Dai, Zhuyun, Pasupat, Panupong, Chen, Anthony, Chaganty, Arun Tejasvi, Fan, Yicheng, Zhao, Vincent Y., Lao, Ni, Lee, Hongrae, Juan, Da-Cheng, Guu, Kelvin

Language models (LMs) now excel at many tasks such as few-shot learning, question answering, reasoning, and dialog. However, they sometimes generate unsupported or misleading content. A user cannot easily determine whether their outputs are trustworthy or not, because most LMs do not have any built-in mechanism for attribution to external evidence. To enable attribution while still preserving all the powerful advantages of recent generation models, we propose RARR (Retrofit Attribution using Research and Revision), a system that 1) automatically finds attribution for the output of any text generation model and 2) post-edits the output to fix unsupported content while preserving the original output as much as possible. When applied to the output of several state-of-the-art LMs on a diverse set of generation tasks, we find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models. Furthermore, the implementation of RARR requires only a handful of training examples, a large language model, and standard web search.

attribution, large language model, machine learning, (19 more...)

2210.08726

Country:

Europe > United Kingdom (0.68)
Asia > Middle East > Jordan (0.04)
Indian Ocean > Red Sea (0.04)
(22 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Television (1.00)
Media > Film (1.00)
Government > Regional Government (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)

Reif, Yuval, Schwartz, Roy

Fighting Bias with Bias: Promoting Model Robustness by Amplifying Dataset Biases

NLP models often rely on superficial cues known as dataset biases to achieve impressive performance, and can fail on examples where these biases do not hold. Recent work sought to develop robust, unbiased models by filtering biased examples from training sets. In this work, we argue that such filtering can obscure the true capabilities of models to overcome biases, which might never be removed in full from the dataset. We suggest that in order to drive the development of models robust to subtle biases, dataset biases should be amplified in the training set. We introduce an evaluation framework defined by a bias-amplified training set and an anti-biased test set, both automatically extracted from existing datasets. Experiments across three notions of bias, four datasets and two models show that our framework is substantially more challenging for models than the original data splits, and even more challenging than hand-crafted challenge sets. Our evaluation framework can use any existing dataset, even those considered obsolete, to test model robustness. We hope our work will guide the development of robust models that do not rely on superficial biases and correlations. To this end, we publicly release our code and data.

artificial intelligence, machine learning, natural language, (19 more...)

2305.18917

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.14)
North America > Dominican Republic (0.04)
(14 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

RelationMatch: Matching In-batch Relationships for Semi-supervised Learning

Zhang, Yifan, Yang, Jingqin, Tan, Zhiquan, Yuan, Yang

Semi-supervised learning has achieved notable success by leveraging very few labeled data and exploiting the wealth of information derived from unlabeled data. However, existing algorithms usually focus on aligning predictions on paired data points augmented from an identical source, and overlook the inter-point relationships within each batch. This paper introduces a novel method, RelationMatch, which exploits in-batch relationships with a matrix cross-entropy (MCE) loss function. Through the application of MCE, our proposed method consistently surpasses the performance of established state-of-the-art methods, such as FixMatch and FlexMatch, across a variety of vision datasets. Notably, we observed a substantial enhancement of 15.21% in accuracy over FlexMatch on the STL-10 dataset using only 40 labels. Moreover, we apply MCE to supervised learning scenarios, and observe consistent improvements as well.

artificial intelligence, inductive learning, machine learning, (18 more...)

2305.10397

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > Promising Solution (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Simon, James B., Knutins, Maksis, Ziyin, Liu, Geisz, Daniel, Fetterman, Abraham J., Albrecht, Joshua

On the Stepwise Nature of Self-Supervised Learning

We present a simple picture of the training process of joint embedding self-supervised learning methods. We find that these methods learn their high-dimensional embeddings one dimension at a time in a sequence of discrete, well-separated steps. We arrive at this conclusion via the study of a linearized model of Barlow Twins applicable to the case in which the trained network is infinitely wide. We solve the training dynamics of this model from small initialization, finding that the model learns the top eigenmodes of a certain contrastive kernel in a stepwise fashion, and obtain a closed-form expression for the final learned representations. Remarkably, we then see the same stepwise learning phenomenon when training deep ResNets using the Barlow Twins, SimCLR, and VICReg losses. Our theory suggests that, just as kernel regression can be thought of as a model of supervised learning, kernel PCA may serve as a useful model of self-supervised learning.

artificial intelligence, machine learning, representation, (17 more...)

2303.15438

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Contrastive Learning Is Spectral Clustering On Similarity Graph

Tan, Zhiquan, Zhang, Yifan, Yang, Jingqin, Yuan, Yang

Contrastive learning is a powerful self-supervised learning method, but we have a limited theoretical understanding of how it works and why it works. In this paper, we prove that contrastive learning with the standard InfoNCE loss is equivalent to spectral clustering on the similarity graph. Using this equivalence as the building block, we extend our analysis to the CLIP model and rigorously characterize how similar multi-modal objects are embedded together. Motivated by our theoretical insights, we introduce the kernel mixture loss, incorporating novel kernel functions that outperform the standard Gaussian kernel on several vision datasets.

artificial intelligence, inductive learning, machine learning, (15 more...)

2303.15103

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)

Learning Off-Road Terrain Traversability with Self-Supervisions Only

Seo, Junwon, Sim, Sungdae, Shim, Inwook

Estimating the traversability of terrain should be reliable and accurate in diverse conditions for autonomous driving in off-road environments. However, learning-based approaches often yield unreliable results when confronted with unfamiliar contexts, and it is challenging to obtain manual annotations frequently for new circumstances. In this paper, we introduce a method for learning traversability from images that utilizes only self-supervision and no manual labels, enabling it to easily learn traversability in new circumstances. To this end, we first generate self-supervised traversability labels from past driving trajectories by labeling regions traversed by the vehicle as highly traversable. Using the self-supervised labels, we then train a neural network that identifies terrains that are safe to traverse from an image using a one-class classification algorithm. Additionally, we supplement the limitations of self-supervised labels by incorporating methods of self-supervised learning of visual representations. To conduct a comprehensive evaluation, we collect data in a variety of driving environments and perceptual conditions and show that our method produces reliable estimations in various environments. In addition, the experimental results validate that our method outperforms other self-supervised traversability estimation methods and achieves comparable performances with supervised learning methods trained on manually labeled data.

artificial intelligence, machine learning, traversability, (17 more...)

doi: 10.1109/LRA.2023.3284356

2305.18896

Country:

Asia > South Korea > Daejeon > Daejeon (0.04)
North America > United States (0.04)

Genre: Research Report (0.64)

Industry:

Automobiles & Trucks (0.67)
Transportation > Ground > Road (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.76)

Graph Entropy Minimization for Semi-supervised Node Classification

Luo, Yi, Luo, Guangchun, Qin, Ke, Chen, Aiguo

Node classifiers are required to comprehensively reduce prediction errors, training resources, and inference latency in the industry. However, most graph neural networks (GNN) concentrate only on one or two of them. The compromised aspects thus are the shortest boards on the bucket, hindering their practical deployments for industrial-level tasks. This work proposes a novel semi-supervised learning method termed Graph Entropy Minimization (GEM) to resolve the three issues simultaneously. GEM benefits its one-hop aggregation from massive uncategorized nodes, making its prediction accuracy comparable to GNNs with two or more hops message passing. It can be decomposed to support stochastic training with mini-batches of independent edge samples, achieving extremely fast sampling and space-saving training. While its one-hop aggregation is faster in inference than deep GNNs, GEM can be further accelerated to an extreme by deriving a non-hop classifier via online knowledge distillation. Thus, GEM can be a handy choice for latency-restricted and error-sensitive services running on resource-constraint hardware. Code is available at https://github.com/cf020031308/GEM.

artificial intelligence, classification, machine learning, (14 more...)

2305.19502

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Asia > China (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(9 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)