AITopics | decontextualization

Collaborating Authors

decontextualization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Claim Extraction for Fact-Checking: Data, Models, and Automated Metrics

Ullrich, Herbert, Mlynář, Tomáš, Drchal, Jan

arXiv.org Artificial IntelligenceFeb-7-2025

In this paper, we explore the problem of Claim Extraction using one-to-many text generation methods, comparing LLMs, small summarization models finetuned for the task, and a previous NER-centric baseline QACG. As the current publications on Claim Extraction, Fact Extraction, Claim Generation and Check-worthy Claim Detection are quite scattered in their means and terminology, we compile their common objectives, releasing the FEVERFact dataset, with 17K atomic factual claims extracted from 4K contextualised Wikipedia sentences, adapted from the original FEVER. We compile the known objectives into an Evaluation framework of: Atomicity, Fluency, Decontextualization, Faithfulness checked for each generated claim separately, and Focus and Coverage measured against the full set of predicted claims for a single input. For each metric, we implement a scale using a reduction to an already-explored NLP task. We validate our metrics against human grading of generic claims, to see that the model ranking on $F_{fact}$, our hardest metric, did not change and the evaluation framework approximates human grading very closely in terms of $F_1$ and RMSE.

computational linguistic, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.04955

Country:

North America > Dominican Republic (0.04)
Europe > Czechia > Prague (0.04)
North America > Mexico (0.04)
(19 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

A Metasemantic-Metapragmatic Framework for Taxonomizing Multimodal Communicative Alignment

Ji, Eugene Yu

arXiv.org Artificial IntelligenceJan-2-2025

Drawing on contemporary pragmatist philosophy and linguistic theories on cognition, meaning, and communication, this paper presents a dynamic, metasemantic-metapragmatic taxonomy for grounding and conceptualizing human-like multimodal communicative alignment. The framework is rooted in contemporary developments of the three basic communicative capacities initially identified by American logician and pragmatist philosopher Charles Sanders Peirce: iconic (sensory and perceptual qualities), indexical (contextual and sociocultural associations), and rule-like (symbolic and intuitive reasoning). Expanding on these developments, I introduce the concept of indexical contextualization and propose the principle of "contextualization directionality" for characterizing the crucial metapragmatic capacity for maintaining, navigating, or transitioning between semantic and pragmatic modes of multimodal communication. I contend that current cognitive-social computational and engineering methodologies disproportionately emphasize the semantic/metasemantic domain, overlooking the pivotal role of metapragmatic indexicality in traversing the semantic-pragmatic spectrum of communication. The framework's broader implications for intentionality, identity, affect, and ethics in within-modal and cross-modal human-machine alignment are also discussed.

communication, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.01535

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Illinois > Cook County > Chicago (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
(6 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation

Wanner, Miriam, Van Durme, Benjamin, Dredze, Mark

arXiv.org Artificial IntelligenceDec-17-2024

The decompose-then-verify strategy for verification of Large Language Model (LLM) generations decomposes claims that are then independently verified. Decontextualization augments text (claims) to ensure it can be verified outside of the original context, enabling reliable verification. While decomposition and decontextualization have been explored independently, their interactions in a complete system have not been investigated. Their conflicting purposes can create tensions: decomposition isolates atomic facts while decontextualization inserts relevant information. Furthermore, a decontextualized subclaim presents a challenge to the verification step: what part of the augmented text should be verified as it now contains multiple atomic facts? We conduct an evaluation of different decomposition, decontextualization, and verification strategies and find that the choice of strategy matters in the resulting factuality scores. Additionally, we introduce DnDScore, a decontextualization aware verification method which validates subclaims in the context of contextual information.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.13175

Country:

North America > United States > Alabama (0.05)
Asia > Singapore (0.04)
Europe > United Kingdom > Northern Ireland (0.04)
(7 more...)

Genre:

Personal (0.68)
Research Report (0.64)

Industry:

Leisure & Entertainment (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Space Agency (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification

Gunjal, Anisha, Durrett, Greg

arXiv.org Artificial IntelligenceJun-28-2024

Automatic factuality verification of large language model (LLM) generations is becoming more and more widely used to combat hallucinations. A major point of tension in the literature is the granularity of this fact-checking: larger chunks of text are hard to fact-check, but more atomic facts like propositions may lack context to interpret correctly. In this work, we assess the role of context in these atomic facts. We argue that fully atomic facts are not the right representation, and define two criteria for molecular facts: decontextuality, or how well they can stand alone, and minimality, or how little extra information is added to achieve decontexuality. We quantify the impact of decontextualization on minimality, then present a baseline methodology for generating molecular facts automatically, aiming to add the right amount of information. We compare against various methods of decontextualization and find that molecular facts balance minimality with fact verification accuracy in ambiguous settings.

computational linguistic, decontextualization, information, (13 more...)

arXiv.org Artificial Intelligence

2406.20079

Country:

North America > Canada > Ontario > Toronto (0.05)
North America > United States > South Carolina (0.04)
Asia > Singapore (0.04)
(19 more...)

Genre:

Research Report (0.83)
Personal (0.68)

Industry:

Media > Film (0.93)
Government > Regional Government > North America Government > United States Government (0.93)
Leisure & Entertainment > Sports > Hockey (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents

Newman, Benjamin, Soldaini, Luca, Fok, Raymond, Cohan, Arman, Lo, Kyle

arXiv.org Artificial IntelligenceNov-30-2023

Many real-world applications (e.g., note taking, search) require extracting a sentence or paragraph from a document and showing that snippet to a human outside of the source document. Yet, users may find snippets difficult to understand as they lack context from the original document. In this work, we use language models to rewrite snippets from scientific documents to be read on their own. First, we define the requirements and challenges for this user-facing decontextualization task, such as clarifying where edits occur and handling references to other documents. Second, we propose a framework that decomposes the task into three stages: question generation, question answering, and rewriting. Using this framework, we collect gold decontextualizations from experienced scientific article readers. We then conduct a range of experiments across state-of-the-art commercial and open-source language models to identify how to best provide missing-but-relevant information to models for our task. Finally, we develop QaDecontext, a simple prompting strategy inspired by our framework that improves over end-to-end prompting. We conclude with analysis that finds, while rewriting is easy, question generation and answering remain challenging for today's models.

decontextualization, information, snippet, (15 more...)

arXiv.org Artificial Intelligence

2305.14772

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > New York > New York County > New York City (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Concise Answers to Complex Questions: Summarization of Long-form Answers

Potluri, Abhilash, Xu, Fangyuan, Choi, Eunsol

arXiv.org Artificial IntelligenceMay-30-2023

Long-form question answering systems provide rich information by presenting paragraph-level answers, often containing optional background or auxiliary information. While such comprehensive answers are helpful, not all information is required to answer the question (e.g. users with domain knowledge do not need an explanation of background). Can we provide a concise version of the answer by summarizing it, while still addressing the question? We conduct a user study on summarized answers generated from state-of-the-art models and our newly proposed extract-and-decontextualize approach. We find a large proportion of long-form answers (over 90%) in the ELI5 domain can be adequately summarized by at least one system, while complex and implicit answers are challenging to compress. We observe that decontextualization improves the quality of the extractive summary, exemplifying its potential in the summarization task. To promote future work, we provide an extractive summarization dataset covering 1K long-form answers and our user study annotations. Together, we present the first study on summarizing long-form answers, taking a step forward for QA agents that can provide answers at multiple granularities.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2305.19271

Country:

North America > United States > Washington > King County > Seattle (0.04)
Europe > Switzerland (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > Promising Solution (0.34)

Industry:

Consumer Products & Services (0.68)
Leisure & Entertainment > Sports (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Learning to Reject with a Fixed Predictor: Application to Decontextualization

Mohri, Christopher, Andor, Daniel, Choi, Eunsol, Collins, Michael

arXiv.org Artificial IntelligenceJan-31-2023

Large language models, often trained with billions of parameters, have achieved impressive performance in recent years (Raffel et al., 2019) and are used in a wide variety of natural language generation tasks. However, their output is sometimes undesirable, with hallucinated content (Maynez et al., 2020; Filippova, 2020), and much work remains to fully understand their properties. In many applications, such as healthcare, question-answering systems, or customer service, incorrect predictions are particularly costly and must be avoided. This motivates the design of algorithms for large language models and other NLP tasks that achieve high precision on a large fraction of the input set, while abstaining on the rest. How can we devise such accurate models that allow a reject option?

large language model, machine learning, question answering, (18 more...)

arXiv.org Artificial Intelligence

2301.09044

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Dominican Republic (0.04)
Europe > Spain > Valencian Community > Alicante Province > Alicante (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.54)
(2 more...)

Add feedback

Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

Eisenstein, Jacob, Andor, Daniel, Bohnet, Bernd, Collins, Michael, Mimno, David

arXiv.org Artificial IntelligenceOct-31-2022

Explainable question answering systems should produce not only accurate answers but also rationales that justify their reasoning and allow humans to check their work. But what sorts of rationales are useful and how can we train systems to produce them? We propose a new style of rationale for open-book question answering, called \emph{markup-and-mask}, which combines aspects of extractive and free-text explanations. In the markup phase, the passage is augmented with free-text markup that enables each sentence to stand on its own outside the discourse context. In the masking phase, a sub-span of the marked-up passage is selected. To train a system to produce markup-and-mask rationales without annotations, we leverage in-context learning. Specifically, we generate silver annotated data by sending a series of prompts to a frozen pretrained language model, which acts as a teacher. We then fine-tune a smaller student model by training on the subset of rationales that led to correct answers. The student is "honest" in the sense that it is a pipeline: the rationale acts as a bottleneck between the passage and the answer, while the "untrusted" teacher operates under no such constraints. Thus, we offer a new way to build trustworthy pipeline systems from a combination of end-task annotations and frozen pretrained language models.

artificial intelligence, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2210.02498

Country:

North America > Dominican Republic (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Texas (0.04)
(8 more...)

Genre: Research Report (0.40)

Industry:

Education (0.52)
Leisure & Entertainment > Sports (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

Mallinson, Jonathan, Adamek, Jakub, Malmi, Eric, Severyn, Aliaksei

arXiv.org Artificial IntelligenceOct-26-2022

We present EdiT5 - a novel semi-autoregressive text-editing model designed to combine the strengths of non-autoregressive text-editing and autoregressive decoding. EdiT5 is faster during inference than conventional sequence-to-sequence (seq2seq) models, while being capable of modelling flexible input-output transformations. This is achieved by decomposing the generation process into three sub-tasks: (1) tagging to decide on the subset of input tokens to be preserved in the output, (2) re-ordering to define their order in the output text, and (3) insertion to infill the missing tokens that are not present in the input. The tagging and re-ordering steps, which are responsible for generating the largest portion of the output, are non-autoregressive, while the insertion step uses an autoregressive decoder. Depending on the task, EdiT5 on average requires significantly fewer autoregressive steps, demonstrating speedups of up to 25x when compared to seq2seq models. Quality-wise, EdiT5 is initialized with a pre-trained T5 checkpoint yielding comparable performance to T5 in high-resource settings when evaluated on three NLG tasks: Sentence Fusion, Grammatical Error Correction, and Decontextualization while clearly outperforming T5 in low-resource settings.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2205.12209

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(10 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Decontextualization: Making Sentences Stand-Alone

Choi, Eunsol, Palomaki, Jennimaria, Lamm, Matthew, Kwiatkowski, Tom, Das, Dipanjan, Collins, Michael

arXiv.org Artificial IntelligenceFeb-9-2021

Models for question answering, dialogue agents, and summarization often interpret the meaning of a sentence in a rich context and use that meaning in a new context. Taking excerpts of text can be problematic, as key pieces may not be explicit in a local window. We isolate and define the problem of sentence decontextualization: taking a sentence together with its context and rewriting it to be interpretable out of context, while preserving its meaning. We describe an annotation procedure, collect data on the Wikipedia corpus, and use the data to train models to automatically decontextualize sentences. We present preliminary studies that show the value of sentence decontextualization in a user facing task, and as preprocessing for systems that perform document understanding. We argue that decontextualization is an important subtask in many downstream applications, and that the definitions and resources provided can benefit tasks that operate on sentences that occur in a richer context.

annotator, decontextualization, paragraph, (14 more...)

arXiv.org Artificial Intelligence

2102.05169

Country:

Europe > Croatia (0.05)
Europe > France (0.05)
Europe > United Kingdom > England (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry:

Media > Film (0.68)
Leisure & Entertainment > Sports (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.55)

Add feedback