AITopics | Shuster, Kurt

Collaborating Authors

Shuster, Kurt

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models

Wei, Jimmy, Shuster, Kurt, Szlam, Arthur, Weston, Jason, Urbanek, Jack, Komeili, Mojtaba

arXiv.org Artificial IntelligenceJun-8-2023

Current dialogue research primarily studies pairwise (two-party) conversations, and does not address the everyday setting where more than two speakers converse together. In this work, we both collect and evaluate multi-party conversations to study this more general case. We use the LIGHT environment to construct grounded conversations, where each participant has an assigned character to role-play. We thus evaluate the ability of language models to act as one or more characters in such conversations. Models require two skills that pairwise-trained models appear to lack: (1) being able to decide when to talk; (2) producing coherent utterances grounded on multiple characters. We compare models trained on our new dataset to existing pairwise-trained dialogue models, as well as large language models with few-shot prompting. We find that our new dataset, MultiLIGHT, which we will publicly release, can help bring significant improvements in the group setting.

machine learning, natural language, utterance, (17 more...)

arXiv.org Artificial Intelligence

2304.13835

Country:

North America > United States > California (0.14)
Europe > Middle East > Malta (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

Improving Open Language Models by Learning from Organic Interactions

Xu, Jing, Ju, Da, Lane, Joshua, Komeili, Mojtaba, Smith, Eric Michael, Ung, Megan, Behrooz, Morteza, Ngan, William, Moritz, Rashel, Sukhbaatar, Sainbayar, Boureau, Y-Lan, Weston, Jason, Shuster, Kurt

arXiv.org Artificial IntelligenceJun-7-2023

We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety. We are publicly releasing the participating de-identified interaction data for use by the research community, in order to spur further progress. Training models with organic data is challenging because interactions with people "in the wild" include both high quality conversations and feedback, as well as adversarial and toxic behavior. We study techniques that enable learning from helpful teachers while avoiding learning from people who are trying to trick the model into unhelpful or toxic responses. BlenderBot 3x is both preferred in conversation to BlenderBot 3, and is shown to produce safer responses in challenging situations. While our current models are still far from perfect, we believe further improvement can be achieved by continued use of the techniques explored in this work.

machine learning, natural language, reward model, (17 more...)

arXiv.org Artificial Intelligence

2306.04707

Country:

Europe > Italy (0.14)
Asia > Middle East > UAE (0.14)
Asia > China (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

The HCI Aspects of Public Deployment of Research Chatbots: A User Study, Design Recommendations, and Open Challenges

Behrooz, Morteza, Ngan, William, Lane, Joshua, Morse, Giuliano, Babcock, Benjamin, Shuster, Kurt, Komeili, Mojtaba, Chen, Moya, Kambadur, Melanie, Boureau, Y-Lan, Weston, Jason

arXiv.org Artificial IntelligenceJun-7-2023

Publicly deploying research chatbots is a nuanced topic involving necessary risk-benefit analyses. While there have recently been frequent discussions on whether it is responsible to deploy such models, there has been far less focus on the interaction paradigms and design approaches that the resulting interfaces should adopt, in order to achieve their goals more effectively. We aim to pose, ground, and attempt to answer HCI questions involved in this scope, by reporting on a mixed-methods user study conducted on a recent research chatbot. We find that abstract anthropomorphic representation for the agent has a significant effect on user's perception, that offering AI explainability may have an impact on feedback rates, and that two (diegetic and extradiegetic) levels of the chat experience should be intentionally designed. We offer design recommendations and areas of further focus for the research community.

artificial intelligence, chatbot, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.04765

Country: North America > United States (0.29)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.83)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

Iyer, Srinivasan, Lin, Xi Victoria, Pasunuru, Ramakanth, Mihaylov, Todor, Simig, Daniel, Yu, Ping, Shuster, Kurt, Wang, Tianlu, Liu, Qing, Koura, Punit Singh, Li, Xian, O'Horo, Brian, Pereyra, Gabriel, Wang, Jeff, Dewan, Christopher, Celikyilmaz, Asli, Zettlemoyer, Luke, Stoyanov, Ves

arXiv.org Artificial IntelligenceJan-30-2023

Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diversity of the instruction-tuning benchmark, different task sampling strategies, fine-tuning with and without demonstrations, training using specialized datasets for reasoning and dialogue, and finally, the fine-tuning objectives themselves. In this paper, we characterize the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT. OPT-IML demonstrates all three generalization abilities at both scales on four different evaluation benchmarks with diverse tasks and input formats -- PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG. Not only does it significantly outperform OPT on all benchmarks but is also highly competitive with existing models fine-tuned on each specific benchmark. We release OPT-IML at both scales, together with the OPT-IML Bench evaluation framework.

benchmark, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2212.12017

Country:

Europe (1.00)
Asia > Afghanistan (0.67)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Sports > Football (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss Policy for Transfer Learning

Lengerich, Chris, Synnaeve, Gabriel, Zhang, Amy, Leather, Hugh, Shuster, Kurt, Charton, François, Redwood, Charysse

arXiv.org Artificial IntelligenceDec-21-2022

Traditional approaches to RL have focused on learning decision policies directly from episodic decisions, while slowly and implicitly learning the semantics of compositional representations needed for generalization. While some approaches have been adopted to refine representations via auxiliary self-supervised losses while simultaneously learning decision policies, learning compositional representations from hand-designed and context-independent self-supervised losses (multi-view) still adapts relatively slowly to the real world, which contains many non-IID subspaces requiring rapid distribution shift in both time and spatial attention patterns at varying levels of abstraction. In contrast, supervised language model cascades have shown the flexibility to adapt to many diverse manifolds, and hints of self-learning needed for autonomous task transfer. However, to date, transfer methods for language models like few-shot learning and fine-tuning still require human supervision and transfer learning using self-learning methods has been underexplored. We propose a self-supervised loss policy called contrastive distillation which manifests latent variables with high mutual information with both source and target tasks from weights to tokens. We show how this outperforms common methods of transfer learning and suggests a useful design axis of trading off compute for generalizability for online transfer. Contrastive distillation is improved through sampling from memory and suggests a simple algorithm for more efficiently sampling negative examples for contrastive losses than random sampling.

artificial intelligence, arxiv, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2212.11353

Genre: Research Report (0.55)

Industry:

Health & Medicine (0.94)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.82)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DIRECTOR: Generator-Classifiers For Supervised Language Modeling

Arora, Kushal, Shuster, Kurt, Sukhbaatar, Sainbayar, Weston, Jason

arXiv.org Artificial IntelligenceNov-25-2022

Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions. The standard language modeling setup fails to address these issues. In this paper, we introduce a new architecture, {\sc Director}, that consists of a unified generator-classifier with both a language modeling and a classification head for each output token. Training is conducted jointly using both standard language modeling data, and data labeled with desirable and undesirable sequences. Experiments in several settings show that the model has competitive training and decoding speed compared to standard language models while yielding superior results, alleviating known issues while maintaining generation quality. It also outperforms existing model guiding approaches in terms of both accuracy and efficiency.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2206.07694

Country:

North America > United States (0.47)
North America > Canada (0.46)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Sports (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

The CRINGE Loss: Learning what language not to model

Adolphs, Leonard, Gao, Tianyu, Xu, Jing, Shuster, Kurt, Sukhbaatar, Sainbayar, Weston, Jason

arXiv.org Artificial IntelligenceNov-10-2022

Standard language model training employs gold human documents or human-human interaction data, and treats all training data as positive examples. Growing evidence shows that even with very large amounts of positive training data, issues remain that can be alleviated with relatively small amounts of negative data -- examples of what the model should not do. In this work, we propose a novel procedure to train with such data called the CRINGE loss (ContRastive Iterative Negative GEneration). We show the effectiveness of this approach across three different experiments on the tasks of safe generation, contradiction avoidance, and open-domain dialogue. Our models outperform multiple strong baselines and are conceptually simple, easy to train and implement.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.05826

Country: Europe (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.30)

Add feedback

Reason first, then respond: Modular Generation for Knowledge-infused Dialogue

Adolphs, Leonard, Shuster, Kurt, Urbanek, Jack, Szlam, Arthur, Weston, Jason

arXiv.org Artificial IntelligenceNov-9-2021

Large language models can produce fluent dialogue but often hallucinate factual inaccuracies. While retrieval-augmented models help alleviate this issue, they still face a difficult challenge of both reasoning to provide correct knowledge and generating conversation simultaneously. In this work, we propose a modular model, Knowledge to Response (K2R), for incorporating knowledge into conversational agents, which breaks down this problem into two easier steps. K2R first generates a knowledge sequence, given a dialogue context, as an intermediate step. After this "reasoning step", the model then attends to its own generated knowledge sequence, as well as the dialogue context, to produce a final response. In detailed experiments, we find that such a model hallucinates less in knowledge-grounded dialogue tasks, and has advantages in terms of interpretability and modularity. In particular, it can be used to fuse QA and dialogue systems together to enable dialogue agents to give knowledgeable answers, or QA models to give conversational responses in a zero-shot setting.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2111.05204

Country:

Europe (1.00)
North America > United States > Texas (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Sports (0.70)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.66)

Add feedback

Internet-Augmented Dialogue Generation

Komeili, Mojtaba, Shuster, Kurt, Weston, Jason

arXiv.org Artificial IntelligenceJul-15-2021

The largest store of continually updating knowledge on our planet can be accessed via internet search. In this work we study giving access to this information to conversational agents. Large language models, even though they store an impressive amount of knowledge within their weights, are known to hallucinate facts when generating dialogue (Shuster et al., 2021); moreover, those facts are frozen in time at the point of model training. In contrast, we propose an approach that learns to generate an internet search query based on the context, and then conditions on the search results to finally generate a response, a method that can employ up-to-the-minute relevant information. We train and evaluate such models on a newly collected dataset of human-human conversations whereby one of the speakers is given access to internet search during knowledgedriven discussions in order to ground their responses. We find that search-query based access of the internet in conversation provides superior performance compared to existing approaches that either use no augmentation or FAISS-based retrieval (Lewis et al., 2020).

chatbot, information management, knowledge, (20 more...)

arXiv.org Artificial Intelligence

2107.07566

Country:

Europe > Italy (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)

Add feedback

Retrieval Augmentation Reduces Hallucination in Conversation

Shuster, Kurt, Poff, Spencer, Chen, Moya, Kiela, Douwe, Weston, Jason

arXiv.org Artificial IntelligenceApr-15-2021

Despite showing increasingly human-like conversational abilities, state-of-the-art dialogue models often suffer from factual incorrectness and hallucination of knowledge (Roller et al., 2020). In this work we explore the use of neural-retrieval-in-the-loop architectures - recently shown to be effective in open-domain QA (Lewis et al., 2020b; Izacard and Grave, 2020) - for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses. We study various types of architectures with multiple components - retrievers, rankers, and encoder-decoders - with the goal of maximizing knowledgeability while retaining conversational ability. We demonstrate that our best models obtain state-of-the-art performance on two knowledge-grounded conversational tasks. The models exhibit open-domain conversational capabilities, generalize effectively to scenarios not within the training data, and, as verified by human evaluations, substantially reduce the well-known problem of knowledge hallucination in state-of-the-art chatbots.

chatbot, knowledge, neural network, (20 more...)

arXiv.org Artificial Intelligence

2104.07567

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Media (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback