AITopics | Worledge, Theodora

Plotting

Worledge, Theodora

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations

Worledge, Theodora, Hashimoto, Tatsunori, Guestrin, Carlos

arXiv.org Artificial IntelligenceNov-26-2024

Across all fields of academic study, experts cite their sources when sharing information. While large language models (LLMs) excel at synthesizing information, they do not provide reliable citation to sources, making it difficult to trace and verify the origins of the information they present. In contrast, search engines make sources readily accessible to users and place the burden of synthesizing information on the user. Through a survey, we find that users prefer search engines over LLMs for high-stakes queries, where concerns regarding information provenance outweigh the perceived utility of LLM responses. To examine the interplay between verifiability and utility of information-sharing tools, we introduce the extractive-abstractive spectrum, in which search engines and LLMs are extreme endpoints encapsulating multiple unexplored intermediate operating points. Search engines are extractive because they respond to queries with snippets of sources with links (citations) to the original webpages. LLMs are abstractive because they address queries with answers that synthesize and logically transform relevant information from training and in-context sources without reliable citation. We define five operating points that span the extractive-abstractive spectrum and conduct human evaluations on seven systems across four diverse query distributions that reflect real-world QA settings: web search, language simplification, multi-step reasoning, and medical advice. As outputs become more abstractive, we find that perceived utility improves by as much as 200%, while the proportion of properly cited sentences decreases by as much as 50% and users take up to 3 times as long to verify cited information. Our findings recommend distinct operating points for domain-specific LLM systems and our failure analysis informs approaches to high-utility LLM systems that empower users to verify information.

film director and screenwriter, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.17375

Country:

Europe > United Kingdom > Scotland (0.14)
Asia > Middle East > UAE (0.14)
North America > United States > New York (0.14)
(3 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Personal (1.00)
Research Report > New Finding (0.66)

Industry:

Media > Television (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)
(13 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unifying Corroborative and Contributive Attributions in Large Language Models

Worledge, Theodora, Shen, Judy Hanwen, Meister, Nicole, Winston, Caleb, Guestrin, Carlos

arXiv.org Artificial IntelligenceNov-20-2023

As businesses, products, and services spring up around large language models, the trustworthiness of these models hinges on the verifiability of their outputs. However, methods for explaining language model outputs largely fall across two distinct fields of study which both use the term "attribution" to refer to entirely separate techniques: citation generation and training data attribution. In many modern applications, such as legal document generation and medical question answering, both types of attributions are important. In this work, we argue for and present a unified framework of large language model attributions. We show how existing methods of different types of attribution fall under the unified framework. We also use the framework to discuss real-world use cases where one or both types of attributions are required. We believe that this unified framework will guide the use case driven development of systems that leverage both types of attribution, as well as the standardization of their evaluation.

attribution, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2311.12233

Country:

North America > United States > California (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data

Rolf, Esther, Worledge, Theodora, Recht, Benjamin, Jordan, Michael I.

arXiv.org Machine LearningMar-4-2021

Datasets play a critical role in shaping the perception of performance and progress in machine learning (ML)--the way we collect, process, and analyze data affects the way we benchmark success and form new research agendas (Paullada et al., 2020; Dotan & Milli, 2020). A growing appreciation of this determinative role of datasets has sparked a concomitant concern that standard datasets used for training and evaluating ML models lack diversity along significant dimensions, for example, geography, gender, and skin type (Shankar et al., 2017; Buolamwini & Gebru, 2018). Lack of diversity in evaluation data can obfuscate disparate performance when evaluating based on aggregate accuracy (Buolamwini & Gebru, 2018). Lack of diversity in training data can limit the extent to which learned models can adequately apply to all portions of a population, a concern highlighted in recent work in the medical domain (Habib et al., 2019; Hofmanninger et al., 2020). Our work aims to develop a general unifying perspective on the way that dataset composition affects outcomes of machine learning systems.

computer based training, dataset, educational technology, (23 more...)

arXiv.org Machine Learning

2103.03399

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Education > Educational Setting > Online (0.94)
Education > Educational Technology > Educational Software > Computer Based Training (0.69)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback