AITopics | Choi, Eunsol

Plotting

Choi, Eunsol

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring Design Choices for Building Language-Specific LLMs

Tejaswi, Atula, Gupta, Nilesh, Choi, Eunsol

arXiv.org Artificial IntelligenceJun-20-2024

Despite rapid progress in large language models (LLMs), their performance on a vast majority of languages remain unsatisfactory. In this paper, we study building language-specific LLMs by adapting monolingual and multilingual LLMs. We conduct systematic experiments on how design choices (base model selection, vocabulary extension, and continued fine-tuning) impact the adapted LLM, both in terms of efficiency (how many tokens are needed to encode the same amount of information) and end task performance. We find that (1) the initial performance before the adaptation is not always indicative of the final performance. (2) Efficiency can easily improved with simple vocabulary extension and continued fine-tuning in most LLMs we study, and (3) The optimal adaptation method is highly language-dependent, and the simplest approach works well across various experimental settings. Adapting English-centric models can yield better results than adapting multilingual models despite their worse initial performance on low-resource languages. Together, our work lays foundations on efficiently building language-specific LLMs by adapting existing LLMs.

computational linguistic, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.1467

Country:

Europe (0.67)
North America > United States > Texas (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Lingam, Vijay, Tejaswi, Atula, Vavre, Aditya, Shetty, Aneesh, Gudur, Gautham Krishna, Ghosh, Joydeep, Dimakis, Alex, Choi, Eunsol, Bojchevski, Aleksandar, Sanghavi, Sujay

arXiv.org Artificial IntelligenceMay-29-2024

Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights \(W\) and inject learnable matrices \(\Delta W\). These \(\Delta W\) matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although recent PEFT methods have narrowed this gap, they do so at the cost of additional learnable parameters. We propose SVFT, a simple approach that fundamentally differs from existing methods: the structure imposed on \(\Delta W\) depends on the specific weight matrix \(W\). Specifically, SVFT updates \(W\) as a sparse combination of outer products of its singular vectors, training only the coefficients (scales) of these sparse combinations. This approach allows fine-grained control over expressivity through the number of coefficients. Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that only recover up to 85% performance using 0.03 to 0.8% of the trainable parameter budget.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.19597

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

AmbigDocs: Reasoning across Documents on Different Entities under the Same Name

Lee, Yoonsang, Ye, Xi, Choi, Eunsol

arXiv.org Artificial IntelligenceMay-26-2024

Different entities with the same name can be difficult to distinguish. Handling confusing entity mentions is a crucial skill for language models (LMs). For example, given the question "Where was Michael Jordan educated?" and a set of documents discussing different people named Michael Jordan, can LMs distinguish entity mentions to generate a cohesive answer to the question? To test this ability, we introduce a new benchmark, AmbigDocs. By leveraging Wikipedia's disambiguation pages, we identify a set of documents, belonging to different entities who share an ambiguous name. From these documents, we generate questions containing an ambiguous name and their corresponding sets of answers. Our analysis reveals that current state-of-the-art models often yield ambiguous answers or incorrectly merge information belonging to different entities. We establish an ontology categorizing four types of incomplete answers and automatic evaluation metrics to identify such categories. We lay the foundation for future work on reasoning across multiple documents with ambiguous entities.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.12447

Country: North America > United States > Rhode Island (0.28)

Genre: Research Report (0.84)

Industry:

Leisure & Entertainment (0.86)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions

Xu, Fangyuan, Lo, Kyle, Soldaini, Luca, Kuehl, Bailey, Choi, Eunsol, Wadden, David

arXiv.org Artificial IntelligenceMar-6-2024

Large language models (LLMs) adapted to follow user instructions are now widely deployed as conversational agents. In this work, we examine one increasingly common instruction-following task: providing writing assistance to compose a long-form answer. To evaluate the capabilities of current LLMs on this task, we construct KIWI, a dataset of knowledge-intensive writing instructions in the scientific domain. Given a research question, an initial model-generated answer and a set of relevant papers, an expert annotator iteratively issues instructions for the model to revise and improve its answer. We collect 1,260 interaction turns from 234 interaction sessions with three state-of-the-art LLMs. Each turn includes a user instruction, a model response, and a human evaluation of the model response. Through a detailed analysis of the collected responses, we find that all models struggle to incorporate new information into an existing answer, and to perform precise and unambiguous edits. Further, we find that models struggle to judge whether their outputs successfully followed user instructions, with accuracy at least 10 points short of human agreement. Our findings indicate that KIWI will be a valuable resource to measure progress and improve LLMs' instruction-following capabilities for knowledge intensive writing tasks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.03866

Country: North America > United States > Texas (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.60)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

BAT: Learning to Reason about Spatial Sounds with Large Language Models

Zheng, Zhisheng, Peng, Puyuan, Ma, Ziyang, Chen, Xie, Choi, Eunsol, Harwath, David

arXiv.org Artificial IntelligenceFeb-2-2024

Spatial sound reasoning is a fundamental human skill, enabling us to navigate and interpret our surroundings based on sound. In this paper we present BAT, which combines the spatial sound perception ability of a binaural acoustic scene analysis model with the natural language reasoning capabilities of a large language model (LLM) to replicate this innate ability. To address the lack of existing datasets of in-the-wild spatial sounds, we synthesized a binaural audio dataset using AudioSet and SoundSpaces 2.0. Next, we developed SpatialSoundQA, a spatial sound-based question-answering dataset, offering a range of QA tasks that train BAT in various aspects of spatial sound perception and reasoning. The acoustic front end encoder of BAT is a novel spatial audio encoder named Spatial Audio Spectrogram Transformer, or Spatial-AST, which by itself achieves strong performance across sound event detection, spatial localization, and distance estimation. By integrating Spatial-AST with LLaMA-2 7B model, BAT transcends standard Sound Event Localization and Detection (SELD) tasks, enabling the model to reason about the relationships between the sounds in its environment. Our experiments demonstrate BAT's superior performance on both spatial sound perception and reasoning, showcasing the immense potential of LLMs in navigating and interpreting complex spatial audio environments.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.01591

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Crafting In-context Examples according to LMs' Parametric Knowledge

Lee, Yoonsang, Atreya, Pranav, Ye, Xi, Choi, Eunsol

arXiv.org Artificial IntelligenceNov-16-2023

In-context learning has been applied to knowledge-rich tasks such as question answering. In such scenarios, in-context examples are used to trigger a behaviour in the language model: namely, it should surface information stored in its parametric knowledge. We study the construction of in-context example sets, with a focus on the parametric knowledge of the model regarding in-context examples. We identify 'known' examples, where models can correctly answer from its parametric knowledge, and 'unknown' ones. Our experiments show that prompting with 'unknown' examples decreases the performance, potentially as it encourages hallucination rather than searching its parametric knowledge. Constructing an in-context example set that presents both known and unknown information performs the best across diverse settings. We perform analysis on three multi-answer question answering datasets, which allows us to further study answer set ordering strategies based on the LM's knowledge about each answer. Together, our study sheds lights on how to best construct in-context example sets for knowledge-rich tasks.

large language model, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2311.09579

Country:

Africa (0.68)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Media (0.93)
Government > Regional Government (0.93)
Education (0.68)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.75)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.68)

Add feedback

Mitigating Temporal Misalignment by Discarding Outdated Facts

Zhang, Michael J. Q., Choi, Eunsol

arXiv.org Artificial IntelligenceNov-15-2023

While large language models are able to retain vast amounts of world knowledge seen during pretraining, such knowledge is prone to going out of date and is nontrivial to update. Furthermore, these models are often used under temporal misalignment, tasked with answering questions about the present, despite having only been trained on data collected in the past. To mitigate the effects of temporal misalignment, we propose fact duration prediction: the task of predicting how long a given fact will remain true. In our experiments, we demonstrate that identifying which facts are prone to rapid change can help models avoid reciting outdated information and determine which predictions require seeking out up-to-date knowledge sources. We also show how modeling fact duration improves calibration for knowledge-intensive tasks, such as open-retrieval question answering, under temporal misalignment, by discarding volatile facts. Our data and code are released publicly at https://github.com/mikejqzhang/mitigating_misalignment.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.14824

Country:

North America > Canada > Ontario (0.14)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment > Sports > Basketball (0.68)
Leisure & Entertainment > Sports > Football (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs

Zhang, Michael J. Q., Choi, Eunsol

arXiv.org Artificial IntelligenceNov-15-2023

Resolving ambiguities through interaction is a hallmark of natural language, and modeling this behavior is a core challenge in crafting AI assistants. In this work, we study such behavior in LMs by proposing a task-agnostic framework for resolving ambiguity by asking users clarifying questions. Our framework breaks down this objective into three subtasks: (1) determining when clarification is needed, (2) determining what clarifying question to ask, and (3) responding accurately with the new information gathered through clarification. We evaluate systems across three NLP applications: question answering, machine translation and natural language inference. For the first subtask, we present a novel uncertainty estimation approach, intent-sim, that determines the utility of querying for clarification by estimating the entropy over user intents. Our method consistently outperforms existing uncertainty estimation approaches at identifying predictions that will benefit from clarification. When only allowed to ask for clarification on 10% of examples, our system is able to double the performance gains over randomly selecting examples to clarify. Furthermore, we find that intent-sim is robust, demonstrating improvements across a wide range of NLP tasks and LMs. Together, our work lays foundation for studying clarifying interactions with LMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2311.09469

Country: North America > United States > Texas (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Continually Improving Extractive QA via Human Feedback

Gao, Ge, Chen, Hung-Ting, Artzi, Yoav, Choi, Eunsol

arXiv.org Artificial IntelligenceNov-3-2023

We study continually improving an extractive question answering (QA) system via human user feedback. We design and deploy an iterative approach, where information-seeking users ask questions, receive model-predicted answers, and provide feedback. We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time. Our experiments show effective improvement from user feedback of extractive QA models over time across different data regimes, including significant potential for domain adaptation.

machine learning, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2305.12473

Country:

North America > United States > Illinois (0.15)
North America > United States > Gulf of Mexico > Central GOM (0.15)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Baseball (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.35)

Add feedback

Propagating Knowledge Updates to LMs Through Distillation

Padmanabhan, Shankar, Onoe, Yasumasa, Zhang, Michael J. Q., Durrett, Greg, Choi, Eunsol

arXiv.org Artificial IntelligenceOct-30-2023

Modern language models have the capacity to store and use immense amounts of knowledge about real-world entities, but it remains unclear how to update such knowledge stored in model parameters. While prior methods for updating knowledge in LMs successfully inject atomic facts, updated LMs fail to make inferences based on injected facts. In this work, we demonstrate that a context distillation-based approach can both impart knowledge about entities and propagate that knowledge to enable broader inferences. Our approach consists of two stages: transfer set generation and distillation on the transfer set. We first generate a transfer set by prompting a language model to generate continuations from the entity definition. Then, we update the model parameters so that the distribution of the LM (the'student') matches the distribution of the LM conditioned on the definition (the'teacher') on the transfer set. Our experiments demonstrate that this approach is more effective at propagating knowledge updates than finetuning and other gradient-based knowledge-editing methods. Moreover, it does not compromise performance in other contexts, even when injecting the definitions of up to 150 entities at once.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2306.09306

Country:

Europe (0.93)
North America > United States > Texas (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Education (0.93)
Media (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback