AITopics | Thain, Nithum

Collaborating Authors

Thain, Nithum

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Neutral Point of View Text Generation through Parameter-Efficient Reinforcement Learning and a Small-Scale High-Quality Dataset

Hoffmann, Jessica, Ahlheim, Christiane, Yu, Zac, Walfrand, Aria, Jin, Jarvis, Tano, Marie, Beirami, Ahmad, van Liemt, Erin, Thain, Nithum, Sidahmed, Hakim, Dixon, Lucas

arXiv.org Artificial IntelligenceMar-5-2025

This paper describes the construction of a dataset and the evaluation of training methods to improve generative large language models' (LLMs) ability to answer queries on sensitive topics with a Neutral Point of View (NPOV), i.e., to provide significantly more informative, diverse and impartial answers. The dataset, the SHQ-NPOV dataset, comprises 300 high-quality, human-written quadruplets: a query on a sensitive topic, an answer, an NPOV rating, and a set of links to source texts elaborating the various points of view. The first key contribution of this paper is a new methodology to create such datasets through iterative rounds of human peer-critique and annotator training, which we release alongside the dataset. The second key contribution is the identification of a highly effective training regime for parameter-efficient reinforcement learning (PE-RL) to improve NPOV generation. We compare and extensively evaluate PE-RL and multiple baselines-including LoRA finetuning (a strong baseline), SFT and RLHF. PE-RL not only improves on overall NPOV quality compared to the strongest baseline ($97.06\%\rightarrow 99.08\%$), but also scores much higher on features linguists identify as key to separating good answers from the best answers ($60.25\%\rightarrow 85.21\%$ for presence of supportive details, $68.74\%\rightarrow 91.43\%$ for absence of oversimplification). A qualitative analysis corroborates this. Finally, our evaluation finds no statistical differences between results on topics that appear in the training dataset and those on separated evaluation topics, which provides strong evidence that our approach to training PE-RL exhibits very effective out of topic generalization.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.03654

Country:

Asia (0.93)
Europe (0.68)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (0.87)
Research Report > Experimental Study > Negative Result (0.66)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Gemma: Open Models Based on Gemini Research and Technology

Gemma Team, null, Mesnard, Thomas, Hardin, Cassidy, Dadashi, Robert, Bhupatiraju, Surya, Pathak, Shreya, Sifre, Laurent, Rivière, Morgane, Kale, Mihir Sanjay, Love, Juliette, Tafti, Pouya, Hussenot, Léonard, Sessa, Pier Giuseppe, Chowdhery, Aakanksha, Roberts, Adam, Barua, Aditya, Botev, Alex, Castro-Ros, Alex, Slone, Ambrose, Héliou, Amélie, Tacchetti, Andrea, Bulanova, Anna, Paterson, Antonia, Tsai, Beth, Shahriari, Bobak, Lan, Charline Le, Choquette-Choo, Christopher A., Crepy, Clément, Cer, Daniel, Ippolito, Daphne, Reid, David, Buchatskaya, Elena, Ni, Eric, Noland, Eric, Yan, Geng, Tucker, George, Muraru, George-Christian, Rozhdestvenskiy, Grigory, Michalewski, Henryk, Tenney, Ian, Grishchenko, Ivan, Austin, Jacob, Keeling, James, Labanowski, Jane, Lespiau, Jean-Baptiste, Stanway, Jeff, Brennan, Jenny, Chen, Jeremy, Ferret, Johan, Chiu, Justin, Mao-Jones, Justin, Lee, Katherine, Yu, Kathy, Millican, Katie, Sjoesund, Lars Lowe, Lee, Lisa, Dixon, Lucas, Reid, Machel, Mikuła, Maciej, Wirth, Mateo, Sharman, Michael, Chinaev, Nikolai, Thain, Nithum, Bachem, Olivier, Chang, Oscar, Wahltinez, Oscar, Bailey, Paige, Michel, Paul, Yotov, Petko, Chaabouni, Rahma, Comanescu, Ramona, Jana, Reena, Anil, Rohan, McIlroy, Ross, Liu, Ruibo, Mullins, Ryan, Smith, Samuel L, Borgeaud, Sebastian, Girgin, Sertan, Douglas, Sholto, Pandya, Shree, Shakeri, Siamak, De, Soham, Klimenko, Ted, Hennigan, Tom, Feinberg, Vlad, Stokowiec, Wojciech, Chen, Yu-hui, Ahmed, Zafarali, Gong, Zhitao, Warkentin, Tris, Peran, Ludovic, Giang, Minh, Farabet, Clément, Vinyals, Oriol, Dean, Jeff, Kavukcuoglu, Koray, Hassabis, Demis, Ghahramani, Zoubin, Eck, Douglas, Barral, Joelle, Pereira, Fernando, Collins, Eli, Joulin, Armand, Fiedel, Noah, Senter, Evan, Andreev, Alek, Kenealy, Kathleen

arXiv.org Artificial IntelligenceApr-16-2024

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.08295

Country: North America > United States > Arizona (0.14)

Genre: Research Report (0.66)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Detecting Hallucination and Coverage Errors in Retrieval Augmented Generation for Controversial Topics

Chang, Tyler A., Tomanek, Katrin, Hoffmann, Jessica, Thain, Nithum, van Liemt, Erin, Meier-Hellstern, Kathleen, Dixon, Lucas

arXiv.org Artificial IntelligenceMar-13-2024

We explore a strategy to handle controversial topics in LLM-based chatbots based on Wikipedia's Neutral Point of View (NPOV) principle: acknowledge the absence of a single true answer and surface multiple perspectives. We frame this as retrieval augmented generation, where perspectives are retrieved from a knowledge base and the LLM is tasked with generating a fluent and faithful response from the given perspectives. As a starting point, we use a deterministic retrieval system and then focus on common LLM failure modes that arise during this approach to text generation, namely hallucination and coverage errors. We propose and evaluate three methods to detect such errors based on (1) word-overlap, (2) salience, and (3) LLM-based classifiers. Our results demonstrate that LLM-based classifiers, even when trained only on synthetic errors, achieve high error detection performance, with ROC AUC scores of 95.3% for hallucination and 90.5% for coverage error detection on unambiguous error cases. We show that when no training data is available, our other methods still yield good results on hallucination (84.0%) and coverage error (85.2%) detection.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2403.08904

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Government (1.00)
Education (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ConstitutionalExperts: Training a Mixture of Principle-based Prompts

Petridis, Savvas, Wedin, Ben, Yuan, Ann, Wexler, James, Thain, Nithum

arXiv.org Artificial IntelligenceMar-7-2024

Large language models (LLMs) are highly capable at a variety of tasks given the right prompt, but writing one is still a difficult and tedious process. In this work, we introduce ConstitutionalExperts, a method for learning a prompt consisting of constitutional principles (i.e. rules), given a training dataset. Unlike prior methods that optimize the prompt as a single entity, our method incrementally improves the prompt by surgically editing individual principles. We also show that we can improve overall performance by learning unique prompts for different semantic regions of the training data and using a mixture-of-experts (MoE) architecture to route inputs at inference time. We compare our method to other state of the art prompt-optimization techniques across six benchmark datasets. We also investigate whether MoE improves these other techniques. Our results suggest that ConstitutionalExperts outperforms other prompt optimization techniques by 10.9% (F1) and that mixture-of-experts improves all techniques, suggesting its broad applicability.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.04894

Country:

North America > United States (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Towards Agile Text Classifiers for Everyone

Mozes, Maximilian, Hoffmann, Jessica, Tomanek, Katrin, Kouate, Muhamed, Thain, Nithum, Yuan, Ann, Bolukbasi, Tolga, Dixon, Lucas

arXiv.org Artificial IntelligenceOct-21-2023

Text-based safety classifiers are widely used for content moderation and increasingly to tune generative language model behavior - a topic of growing concern for the safety of digital assistants and chatbots. However, different policies require different classifiers, and safety policies themselves improve from iteration and adaptation. This paper introduces and evaluates methods for agile text classification, whereby classifiers are trained using small, targeted datasets that can be quickly developed for a particular policy. Experimenting with 7 datasets from three safety-related domains, comprising 15 annotation schemes, led to our key finding: prompt-tuning large language models, like PaLM 62B, with a labeled dataset of as few as 80 examples can achieve state-of-the-art performance. We argue that this enables a paradigm shift for text classification, especially for models supporting safer online discourse. Instead of collecting millions of examples to attempt to create universal safety classifiers over months or years, classifiers could be tuned using small datasets, created by individuals or small organizations, tailored for specific use cases, and iterated on and adapted in the time-span of a day.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.06541

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Improving Classifier Robustness through Active Generation of Pairwise Counterfactuals

Balashankar, Ananth, Wang, Xuezhi, Qin, Yao, Packer, Ben, Thain, Nithum, Chen, Jilin, Chi, Ed H., Beutel, Alex

arXiv.org Artificial IntelligenceMay-22-2023

Counterfactual Data Augmentation (CDA) is a commonly used technique for improving robustness in natural language classifiers. However, one fundamental challenge is how to discover meaningful counterfactuals and efficiently label them, with minimal human labeling cost. Most existing methods either completely rely on human-annotated labels, an expensive process which limits the scale of counterfactual data, or implicitly assume label invariance, which may mislead the model with incorrect labels. In this paper, we present a novel framework that utilizes counterfactual generative models to generate a large number of diverse counterfactuals by actively sampling from regions of uncertainty, and then automatically label them with a learned pairwise classifier. Our key insight is that we can more correctly label the generated counterfactuals by training a pairwise classifier that interpolates the relationship between the original example and the counterfactual. We demonstrate that with a small amount of human-annotated counterfactual data (10%), we can generate a counterfactual augmentation dataset with learned labels, that provides an 18-20% improvement in robustness and a 14-21% reduction in errors on 6 out-of-domain datasets, comparable to that of a fully human-annotated counterfactual dataset for both sentiment classification and question paraphrase tasks.

machine learning, natural language, text classification, (18 more...)

arXiv.org Artificial Intelligence

2305.13535

Country:

Asia (0.68)
North America > United States (0.68)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.34)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.34)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.34)

Add feedback

Gradient-Based Automated Iterative Recovery for Parameter-Efficient Tuning

Mozes, Maximilian, Bolukbasi, Tolga, Yuan, Ann, Liu, Frederick, Thain, Nithum, Dixon, Lucas

arXiv.org Artificial IntelligenceFeb-13-2023

Pretrained large language models (LLMs) are able to solve a wide variety of tasks through transfer learning. Various explainability methods have been developed to investigate their decision making process. TracIn (Pruthi et al., 2020) is one such gradient-based method which explains model inferences based on the influence of training examples. In this paper, we explore the use of TracIn to improve model performance in the parameter-efficient tuning (PET) setting. We develop conversational safety classifiers via the prompt-tuning PET method and show how the unique characteristics of the PET regime enable TracIn to identify the cause for certain misclassifications by LLMs. We develop a new methodology for using gradient-based explainability techniques to improve model performance, G-BAIR: gradient-based automated iterative recovery. We show that G-BAIR can recover LLM performance on benchmarks after manually corrupting training labels. This suggests that influence methods like TracIn can be used to automatically perform data cleaning, and introduces the potential for interactive debugging and relabeling for PET-based transfer learning methods.

artificial intelligence, machine learning, validation, (17 more...)

arXiv.org Artificial Intelligence

2302.06598

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Fairness without Demographics through Adversarially Reweighted Learning

Lahoti, Preethi, Beutel, Alex, Chen, Jilin, Lee, Kang, Prost, Flavien, Thain, Nithum, Wang, Xuezhi, Chi, Ed H.

arXiv.org Machine LearningNov-3-2020

Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns. However, in practice factors like privacy and regulation often preclude the collection of protected features, or their use for training or inference, severely limiting the applicability of traditional fairness research. Therefore we ask: How can we train an ML model to improve fairness when we do not even know the protected group memberships? In this work we address this problem by proposing Adversarially Reweighted Learning (ARL). In particular, we hypothesize that non-protected features and task labels are valuable for identifying fairness issues, and can be used to co-train an adversarial reweighting approach for improving fairness. Our results show that {ARL} improves Rawlsian Max-Min fairness, with notable AUC improvements for worst-case protected groups in multiple datasets, outperforming state-of-the-art alternatives.

dataset, inductive learning, neural network, (19 more...)

arXiv.org Machine Learning

2006.13114

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

Debiasing Embeddings for Reduced Gender Bias in Text Classification

Prost, Flavien, Thain, Nithum, Bolukbasi, Tolga

arXiv.org Machine LearningAug-7-2019

We investigate how this bias affects downstream classification tasks, using the case study of occupation classification (De-Arteaga et al., 2019). We show that traditional techniques for debiasing embeddings can actually worsen the bias of the downstream classifier by providing a less noisy channel for communicating gender information. With a relatively minor adjustment, however, we show how these same techniques can be used to simultaneously reduce bias and maintain high classification accuracy.

gender component, health & medicine, neural network, (18 more...)

arXiv.org Machine Learning

1908.0281

Genre: Research Report (0.83)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.90)

Add feedback

Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification

Borkan, Daniel, Dixon, Lucas, Sorensen, Jeffrey, Thain, Nithum, Vasserman, Lucy

arXiv.org Machine LearningMar-11-2019

Machine learning systems, if not constrained, will compounding existing challenges to fairness in society often learn the simplest associations that can predict the labels, so at large. In this paper, we introduce a suite of threshold-agnostic any incorrect associations present in the training data can produce metrics that provide a nuanced view of this unintended bias, by unintended associations in the final model. Toxicity models specifically considering the various ways that a classifier's score distribution have been shown to capture and reproduce biases common can vary across designated groups. We also introduce a large new in society, for example mis-associating the names of frequently test set of online comments with crowd-sourced annotations for attacked identity groups (such as "gay", and "muslim" etc.) with identity references. We use this to show how our metrics can be toxicity [5, 17]. This unintended model bias could be due to the used to find new and potentially subtle unintended bias in existing demographic composition of the online user pool, the latent or public models.

crowdsourcing, social media, unintended bias, (21 more...)

arXiv.org Machine Learning

1903.04561

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Add feedback