The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations
Worledge, Theodora, Hashimoto, Tatsunori, Guestrin, Carlos
–arXiv.org Artificial Intelligence
Across all fields of academic study, experts cite their sources when sharing information. While large language models (LLMs) excel at synthesizing information, they do not provide reliable citation to sources, making it difficult to trace and verify the origins of the information they present. In contrast, search engines make sources readily accessible to users and place the burden of synthesizing information on the user. Through a survey, we find that users prefer search engines over LLMs for high-stakes queries, where concerns regarding information provenance outweigh the perceived utility of LLM responses. To examine the interplay between verifiability and utility of information-sharing tools, we introduce the extractive-abstractive spectrum, in which search engines and LLMs are extreme endpoints encapsulating multiple unexplored intermediate operating points. Search engines are extractive because they respond to queries with snippets of sources with links (citations) to the original webpages. LLMs are abstractive because they address queries with answers that synthesize and logically transform relevant information from training and in-context sources without reliable citation. We define five operating points that span the extractive-abstractive spectrum and conduct human evaluations on seven systems across four diverse query distributions that reflect real-world QA settings: web search, language simplification, multi-step reasoning, and medical advice. As outputs become more abstractive, we find that perceived utility improves by as much as 200%, while the proportion of properly cited sentences decreases by as much as 50% and users take up to 3 times as long to verify cited information. Our findings recommend distinct operating points for domain-specific LLM systems and our failure analysis informs approaches to high-utility LLM systems that empower users to verify information.
arXiv.org Artificial Intelligence
Nov-26-2024
- Country:
- South America
- Argentina (0.04)
- Colombia > Meta Department
- Villavicencio (0.04)
- North America
- United States
- Indiana (0.04)
- Illinois (0.04)
- Tennessee (0.04)
- New York > New York County
- New York City (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Colorado > Weld County
- Evans (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- Czechia (0.27)
- France (0.04)
- Germany (0.04)
- Austria (0.04)
- Slovakia (0.04)
- Russia (0.04)
- Hungary (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Italy
- Tuscany > Florence (0.04)
- Umbria > Perugia Province
- Perugia (0.04)
- Norway > Eastern Norway
- Oslo (0.04)
- Croatia
- Zagreb County > Zagreb (0.04)
- Dubrovnik-Neretva County > Dubrovnik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- United Kingdom
- Poland > Masovia Province
- Warsaw (0.04)
- Asia
- Singapore (0.04)
- Indonesia > Bali (0.04)
- Japan (0.04)
- Russia (0.04)
- India > Maharashtra (0.04)
- Middle East
- Jordan (0.04)
- Iran (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Bangladesh > Dhaka Division
- Dhaka District > Dhaka (0.04)
- South America
- Genre:
- Questionnaire & Opinion Survey (1.00)
- Personal (1.00)
- Research Report > New Finding (0.66)
- Industry:
- Leisure & Entertainment (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Consumer Products & Services (1.00)
- Law (0.92)
- Media
- Television (1.00)
- Film (1.00)
- News (0.92)
- Health & Medicine
- Pharmaceuticals & Biotechnology (1.00)
- Consumer Health (1.00)
- Health Care Technology (0.92)
- Diagnostic Medicine (0.67)
- Therapeutic Area
- Infections and Infectious Diseases (1.00)
- Endocrinology > Diabetes (0.92)
- Neurology (0.68)
- Government > Regional Government
- Education > Educational Setting
- Higher Education (0.67)
- Technology: