AITopics | Wei, Jerry

Collaborating Authors

Wei, Jerry

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Sharma, Mrinank, Tong, Meg, Mu, Jesse, Wei, Jerry, Kruthoff, Jorrit, Goodfriend, Scott, Ong, Euan, Peng, Alwin, Agarwal, Raj, Anil, Cem, Askell, Amanda, Bailey, Nathan, Benton, Joe, Bluemke, Emma, Bowman, Samuel R., Christiansen, Eric, Cunningham, Hoagy, Dau, Andy, Gopal, Anjali, Gilson, Rob, Graham, Logan, Howard, Logan, Kalra, Nimit, Lee, Taesung, Lin, Kevin, Lofgren, Peter, Mosconi, Francesco, O'Hara, Clare, Olsson, Catherine, Petrini, Linda, Rajani, Samir, Saxena, Nikhil, Silverstein, Alex, Singh, Tanya, Sumers, Theodore, Tang, Leonard, Troy, Kevin K., Weisser, Constantin, Zhong, Ruiqi, Zhou, Giulio, Leike, Jan, Kaplan, Jared, Perez, Ethan

arXiv.org Artificial IntelligenceJan-30-2025

Large language models (LLMs) are vulnerable to universal jailbreaks--prompting strategies that systematically bypass model safeguards and enable users to carry out harmful processes that require many model interactions, like manufacturing illegal substances at scale. To defend against these attacks, we introduce Constitutional Classifiers: safeguards trained on synthetic data, generated by prompting LLMs with natural language rules (i.e., a constitution) specifying permitted and restricted content. In over 3,000 estimated hours of red teaming, no red teamer found a universal jailbreak that could extract information from an early classifier-guarded LLM at a similar level of detail to an unguarded model across most target queries. On automated evaluations, enhanced classifiers demonstrated robust defense against held-out domain-specific jailbreaks. These classifiers also maintain deployment viability, with an absolute 0.38% increase in production-traffic refusals and a 23.7% inference overhead. Our work demonstrates that defending against universal jailbreaks while maintaining practical deployment viability is tractable.

classifier, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.18837

Genre:

Workflow (1.00)
Questionnaire & Opinion Survey (0.92)
Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Military (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Add feedback

Best Practices and Lessons Learned on Synthetic Data for Language Models

Liu, Ruibo, Wei, Jerry, Liu, Fangyu, Si, Chenglei, Zhang, Yanzhe, Rao, Jinmeng, Zheng, Steven, Peng, Daiyi, Yang, Diyi, Zhou, Denny, Dai, Andrew M.

arXiv.org Artificial IntelligenceApr-11-2024

The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.

data mining, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2404.07503

Country:

North America > United States > Louisiana (0.14)
North America > United States > Maryland (0.14)
Europe > United Kingdom > Scotland (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.92)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

Long-form factuality in large language models

Wei, Jerry, Yang, Chengrun, Song, Xinying, Lu, Yifeng, Hu, Nathan, Huang, Jie, Tran, Dustin, Peng, Daiyi, Liu, Ruibo, Huang, Da, Du, Cosmo, Le, Quoc V.

arXiv.org Artificial IntelligenceApr-3-2024

Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE). SAFE utilizes an LLM to break down a long-form response into a set of individual facts and to evaluate the accuracy of each fact using a multi-step reasoning process comprising sending search queries to Google Search and determining whether a fact is supported by the search results. Furthermore, we propose extending F1 score as an aggregated metric for long-form factuality. To do so, we balance the percentage of supported facts in a response (precision) with the percentage of provided facts relative to a hyperparameter representing a user's preferred response length (recall). Empirically, we demonstrate that LLM agents can outperform crowdsourced human annotators - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time. At the same time, SAFE is more than 20 times cheaper than human annotators. We also benchmark thirteen language models on LongFact across four model families (Gemini, GPT, Claude, and PaLM-2), finding that larger language models generally achieve better long-form factuality. LongFact, SAFE, and all experimental code are available at https://github.com/google-deepmind/long-form-factuality.

factuality, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2403.18802

Country:

Oceania > Australia (1.00)
North America > United States > California (0.92)
Africa (0.92)
(2 more...)

Genre:

Research Report (1.00)
Personal > Interview (0.47)
Personal > Honors (0.46)
Personal > Obituary (0.45)

Industry:

Media > Television (1.00)
Media > Music (1.00)
Media > Film (1.00)
(20 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Symbol tuning improves in-context learning in language models

Wei, Jerry, Hou, Le, Lampinen, Andrew, Chen, Xiangning, Huang, Da, Tay, Yi, Chen, Xinyun, Lu, Yifeng, Zhou, Denny, Ma, Tengyu, Le, Quoc V.

arXiv.org Artificial IntelligenceDec-30-2023

We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e.g., "positive/negative sentiment") are replaced with arbitrary symbols (e.g., "foo/bar"). Symbol tuning leverages the intuition that when a model cannot use instructions or natural language labels to figure out a task, it must instead do so by learning the input-label mappings. We experiment with symbol tuning across Flan-PaLM models up to 540B parameters and observe benefits across various settings. First, symbol tuning boosts performance on unseen in-context learning tasks and is much more robust to underspecified prompts, such as those without instructions or without natural language labels. Second, symbol-tuned models are much stronger at algorithmic reasoning tasks, with up to 18.2% better performance on the List Functions benchmark and up to 15.3% better performance on the Simple Turing Concepts benchmark. Finally, symbol-tuned models show large improvements in following flipped-labels presented in-context, meaning that they are more capable of using in-context information to override prior semantic knowledge.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.08298

Country:

Africa (1.00)
Asia > Middle East > Iraq (0.67)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Semiconductors & Electronics (1.00)
Media > Film (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(18 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

Vu, Tu, Iyyer, Mohit, Wang, Xuezhi, Constant, Noah, Wei, Jerry, Wei, Jason, Tar, Chris, Sung, Yun-Hsuan, Zhou, Denny, Le, Quoc, Luong, Thang

arXiv.org Artificial IntelligenceNov-22-2023

Most large language models (LLMs) are trained once and never updated; thus, they lack the ability to dynamically adapt to our ever-changing world. In this work, we perform a detailed study of the factuality of LLM-generated text in the context of answering questions that test current world knowledge. Specifically, we introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types, including questions that require fast-changing world knowledge as well as questions with false premises that need to be debunked. We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination. Through human evaluations involving more than 50K judgments, we shed light on limitations of these models and demonstrate significant room for improvement: for instance, all models (regardless of model size) struggle on questions that involve fast-changing knowledge and false premises. Motivated by these results, we present FreshPrompt, a simple few-shot prompting method that substantially boosts the performance of an LLM on FreshQA by incorporating relevant and up-to-date information retrieved from a search engine into the prompt. Our experiments show that FreshPrompt outperforms both competing search engine-augmented prompting methods such as Self-Ask (Press et al., 2022) as well as commercial systems such as Perplexity.AI. Further analysis of FreshPrompt reveals that both the number of retrieved evidences and their order play a key role in influencing the correctness of LLM-generated answers. Additionally, instructing the LLM to generate concise and direct answers helps reduce hallucination compared to encouraging more verbose answers. To facilitate future work, we release FreshQA at github.com/freshllms/freshqa and commit to updating it at regular intervals.

information, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2310.03214

Country:

Europe > United Kingdom (0.67)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment > Sports > Motorsports > Formula One (1.00)
Government > Regional Government (0.67)
Leisure & Entertainment > Sports > Olympic Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

What Are People Asking About COVID-19? A Question Classification Dataset

Wei, Jerry, Huang, Chengyu, Vosoughi, Soroush, Wei, Jason

arXiv.org Artificial IntelligenceSep-8-2023

We present COVID-Q, a set of 1,690 questions about COVID-19 from 13 sources, which we annotate into 15 question categories and 207 question clusters. The most common questions in our dataset asked about transmission, prevention, and societal effects of COVID, and we found that many questions that appeared in multiple sources were not answered by any FAQ websites of reputable organizations such as the CDC and FDA. We post our dataset publicly at https://github.com/JerryWeiAI/COVID-Q. For classifying questions into 15 categories, a BERT baseline scored 58.1% accuracy when trained on 20 examples per category, and for a question clustering task, a BERT + triplet loss baseline achieved 49.5% accuracy. We hope COVID-Q can help either for direct use in developing applied systems or as a domain-specific resource for model evaluation.

machine learning, natural language, question answering, (20 more...)

arXiv.org Artificial Intelligence

2005.12522

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report (0.51)
Frequently Asked Questions (FAQ) (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Public Health (1.00)
(2 more...)

Technology:

Information Technology > Communications > Social Media (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.47)

Add feedback

NewB: 200,000+ Sentences for Political Bias Detection

Wei, Jerry

arXiv.org Artificial IntelligenceSep-8-2023

We present the Newspaper Bias Dataset (NewB), a text corpus of more than 200,000 sentences from eleven news sources regarding Donald Trump. While previous datasets have labeled sentences as either liberal or conservative, NewB covers the political views of eleven popular media sources, capturing more nuanced political viewpoints than a traditional binary classification system does. We train two state-of-the-art deep learning models to predict the news source of a given sentence from eleven newspapers and find that a recurrent neural network achieved top-1, top-3, and top-5 accuracies of 33.3%, 61.4%, and 77.6%, respectively, significantly outperforming a baseline logistic regression model's accuracies of 18.3%, 42.6%, and 60.8%. Using the news source label of sentences, we analyze the top n-grams with our model to gain meaningful insight into the portrayal of Trump by media sources.We hope that the public release of our dataset will encourage further research in using natural language processing to analyze more complex political biases. Our dataset is posted at https://github.com/JerryWeiAI/NewB .

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2006.03051

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.35)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Simple synthetic data reduces sycophancy in large language models

Wei, Jerry, Huang, Da, Lu, Yifeng, Zhou, Denny, Le, Quoc V.

arXiv.org Artificial IntelligenceAug-7-2023

Sycophancy is an undesirable behavior where models tailor their responses to follow a human user's view even when that view is not objectively correct (e.g., adapting liberal views once a user reveals that they are liberal). In this paper, we study the prevalence of sycophancy in language models and propose a simple synthetic-data intervention to reduce this behavior. First, on a set of three sycophancy tasks (Perez et al., 2022) where models are asked for an opinion on statements with no correct answers (e.g., politics), we observe that both model scaling and instruction tuning significantly increase sycophancy for PaLM models up to 540B parameters. Second, we extend sycophancy evaluations to simple addition statements that are objectively incorrect, finding that despite knowing that these statements are wrong, language models will still agree with them if the user does as well. To reduce sycophancy, we present a straightforward synthetic-data intervention that takes public NLP tasks and encourages models to be robust to user opinions on these tasks. Adding these data in a lightweight finetuning step can significantly reduce sycophantic behavior on held-out prompts. Code for generating synthetic data for intervention can be found at https://github.com/google/sycophancy-intervention.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.03958

Country:

Europe > United Kingdom > England (0.14)
North America > United States > California (0.14)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (0.67)
Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Larger language models do in-context learning differently

Wei, Jerry, Wei, Jason, Tay, Yi, Tran, Dustin, Webson, Albert, Lu, Yifeng, Chen, Xinyun, Liu, Hanxiao, Huang, Da, Zhou, Denny, Ma, Tengyu

arXiv.org Artificial IntelligenceMar-8-2023

We study how in-context learning (ICL) in language models is affected by semantic priors versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels-across various model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). First, experiments on ICL with flipped labels show that overriding semantic priors is an emergent ability of model scale. While small language models ignore flipped labels presented in-context and thus rely primarily on semantic priors from pretraining, large models can override semantic priors when presented with in-context exemplars that contradict priors, despite the stronger semantic priors that larger models may hold. We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e.g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task. The ability to do SUL-ICL also emerges primarily with scale, and large-enough language models can even perform linear classification in a SUL-ICL setting. Finally, we evaluate instruction-tuned models and find that instruction tuning strengthens both the use of semantic priors and the capacity to learn input-label mappings, but more of the former.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2303.03846

Country:

Europe (1.00)
Asia > Middle East (0.67)
North America > United States > California (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback