Goto

Collaborating Authors

 inoculation


Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time

Tan, Daniel, Woodruff, Anders, Warncke, Niels, Jose, Arun, Riché, Maxime, Africa, David Demitri, Taylor, Mia

arXiv.org Artificial Intelligence

Language model finetuning often results in learning undesirable traits in combination with desired ones. To address this, we propose inoculation prompting: modifying finetuning data by prepending a short system-prompt instruction that deliberately elicits the undesirable trait. At test time, we evaluate without the instruction; inoculated models have much lower expression of the trait than models trained with unmodified training data. Inoculation is selective: in a toy setting where assistant responses are always in Spanish and ALL-CAPS, an appropriate inoculation (e.g., ``You always speak in Spanish.'') teaches the model to capitalize responses while still responding in English. We find that inoculation is also effective across several additional settings: reducing emergent misalignment (EM) from task-specific finetuning, defending against backdoor injections, and mitigating the transmission of traits via subliminal learning. Follow-up analysis suggests a mechanism: making a trait less surprising via inoculation reduces optimization pressure to globally update the model, thereby reducing the degree of generalization. Our analysis relates to prior work on EM: inoculation explains prior findings that educational contexts mitigate EM from insecure code. Beyond demonstrating a simple and effective technique for selective learning, our results contribute to a better conceptual understanding of how and why language models generalize.


Improving QA Model Performance with Cartographic Inoculation

Chen, Allen, Tanrikulu, Okan

arXiv.org Artificial Intelligence

QA models are faced with complex and openended contextual reasoning problems, but can often learn well-performing solution heuristics by exploiting dataset-specific patterns in their training data. These patterns, or "dataset artifacts", reduce the model's ability to generalize to real-world QA problems. Utilizing an ElectraSmallDiscriminator model trained for QA, we analyze the impacts and incidence of dataset artifacts using an adversarial challenge set designed to confuse models reliant on artifacts for prediction. Extending existing work on methods for mitigating artifact impacts, we propose cartographic inoculation, a novel method that fine-tunes models on an optimized subset of the challenge data to reduce model reliance on dataset artifacts. We show Figure 1: Visualization depicting the inoculation by that by selectively fine-tuning a model on ambiguous fine-tuning method and potential outcomes, figure adversarial examples from a challenge adapted from Liu et al. (2019) set, significant performance improvements can be made on the full challenge dataset with minimal loss of model generalizability to other


Multi-Set Inoculation: Assessing Model Robustness Across Multiple Challenge Sets

Gupta, Vatsal, Pandya, Pranshu, Kataria, Tushar, Gupta, Vivek, Roth, Dan

arXiv.org Artificial Intelligence

Language models, given their black-box nature, often exhibit sensitivity to input perturbations, leading to trust issues due to hallucinations. To bolster trust, it's essential to understand these models' failure modes and devise strategies to enhance their performance. In this study, we propose a framework to study the effect of input perturbations on language models of different scales, from pre-trained models to large language models (LLMs). We use fine-tuning to train a robust model to perturbations, and we investigate whether exposure to one perturbation improves or degrades the model's performance on other perturbations. To address multi-perturbation robustness, we suggest three distinct training strategies. We also extend the framework to LLMs via a chain of thought(COT) prompting with exemplars. We instantiate our framework for the Tabular-NLI task and show that the proposed strategies train the model robust to different perturbations without losing accuracy on a given dataset.


Assessing Out-of-Domain Language Model Performance from Few Examples

Singhal, Prasann, Forristal, Jarad, Ye, Xi, Durrett, Greg

arXiv.org Artificial Intelligence

While pretrained language models have exhibited impressive generalization capabilities, they still behave unpredictably under certain domain shifts. In particular, a model may learn a reasoning process on in-domain training data that does not hold for out-of-domain test data. We address the task of predicting out-of-domain (OOD) performance in a few-shot fashion: given a few target-domain examples and a set of models with similar training performance, can we understand how these models will perform on OOD test data? We benchmark the performance on this task when looking at model accuracy on the few-shot examples, then investigate how to incorporate analysis of the models' behavior using feature attributions to better tackle this problem. Specifically, we explore a set of "factors" designed to reveal model agreement with certain pathological heuristics that may indicate worse generalization capabilities. On textual entailment, paraphrase recognition, and a synthetic classification task, we show that attribution-based factors can help rank relative model OOD performance. However, accuracy on a few-shot test set is a surprisingly strong baseline, particularly when the system designer does not have in-depth prior knowledge about the domain shift.


AI Ethics And AI-Induced Psychological Inoculation To Help Humans With Disinformation

#artificialintelligence

What are we going to do about the massive glut of disinformation and misinformation? It all is demonstrably getting worse and worse, with each passing day. Perhaps Artificial Intelligence (AI) can come to our rescue. Yes, that's right, we might be able to harness the beneficial uses of AI to cope with our relentless tsunami of disinformation and misinformation. We might be wise to try doing so. Every avenue of potential solution would seem worthy of pursuit. As an aside, I'd like to immediately acknowledge and note that AI is undoubtedly going to also be a part of the problem too. There is no question that humans can readily leverage AI to generate disinformation and misinformation. Furthermore, AI can insidiously be used to make disinformation and misinformation appear to be amazingly valid and fool humans into believing that the presented information is alluringly accurate and factual. A decidedly sad face side of what AI brings to the table. We will come back to this downside conundrum toward the end of this discussion. For now, let's put on our smiley faces and explore how AI is beneficial to bringing disinformation and misinformation to its mighty knees.


Image-Based Plant Wilting Estimation

Yang, Changye, Baireddy, Sriram, Cai, Enyu, Meline, Valerian, Caldwell, Denise, Iyer-Pascuzzi, Anjali S., Delp, Edward J.

arXiv.org Artificial Intelligence

Many plants become limp or droop through heat, loss of water, or disease. This is also known as wilting. In this paper, we examine plant wilting caused by bacterial infection. In particular, we want to design a metric for wilting based on images acquired of the plant. A quantifiable wilting metric will be useful in studying bacterial wilt and identifying resistance genes. Since there is no standard way to estimate wilting, it is common to use ad hoc visual scores. This is very subjective and requires expert knowledge of the plants and the disease mechanism. Our solution consists of using various wilting metrics acquired from RGB images of the plants. We also designed several experiments to demonstrate that our metrics are effective at estimating wilting in plants.


The role of artificial intelligence in vaccine distribution.

#artificialintelligence

The role of artificial intelligence in vaccine distribution will be very critical in vaccinating the global population against COVID-19. Vaccine distribution is one of the biggest logistical challenges humanity has faced so far and I think AI can be leveraged to help us with the equitable distribution of the vaccine. In the United States, as of now the rollout of the vaccine has been painfully slow with a lot of logistical issues from distribution to inoculations. Worldwide, the progress is even more sluggish, with some countries yet to start the journey of inoculations. The role of artificial intelligence in vaccine distribution involves the following challenges that AI can help with provided we have quality and accurate data.


On the Benefits of Inoculation, an Example in Train Scheduling

Semet, Yann, Schoenauer, Marc

arXiv.org Artificial Intelligence

The local reconstruction of a railway schedule following a small perturbation of the traffic, seeking minimization of th e total accumulated delay, is a very difficult and tightly constrained combinatorial problem. Notoriously enough, the railway company's public image degrades proportionally to the amount of daily delays, and the same goes for its profit! This paper describes an inoculation procedure which greatly enhances an evolutionary algorithm for train re-schedulin g. The procedure consists in building the initial population around a pre-computed solution based on problem-related information available beforehand. The optimization is performed by adapting times of departure and arrival, as well as allocation of tracks, for eac h train at each station. This is achieved by a permutation-based evolutionary algorithm that relies on a semi-greedy heuristic scheduler to gradually reconstruct the schedule by inserting trains one after another. Experimental results are presented on various instances of a large real-world case involving around 500 trains and more than 1 million constraints. In terms of competition with commercial mathematical programming tool ILOG CPLEX, it appears that within a large class of instances, excluding trivial instances as well as too difficult ones, and with very few exceptions, a clever initialization turns an encouragi ng failure into a clear-cut success auguring of substantial fin an-cial savings.