AITopics | document summarization

Collaborating Authors

document summarization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Effect of Document Summarization on LLM-Based Relevance Judgments

Mohtadi, Samaneh, Roitero, Kevin, Mizzaro, Stefano, Demartini, Gianluca

arXiv.org Artificial IntelligenceDec-8-2025

Relevance judgments are central to the evaluation of Information Retrieval (IR) systems, but obtaining them from human annotators is costly and time-consuming. Large Language Models (LLMs) have recently been proposed as automated assessors, showing promising alignment with human annotations. Most prior studies have treated documents as fixed units, feeding their full content directly to LLM assessors. We investigate how text summarization affects the reliability of LLM-based judgments and their downstream impact on IR evaluation. Using state-of-the-art LLMs across multiple TREC collections, we compare judgments made from full documents with those based on LLM-generated summaries of different lengths. We examine their agreement with human labels, their effect on retrieval effectiveness evaluation, and their influence on IR systems' ranking stability. Our findings show that summary-based judgments achieve comparable stability in systems' ranking to full-document judgments, while introducing systematic shifts in label distributions and biases that vary by model and dataset. These results highlight summarization as both an opportunity for more efficient large-scale IR evaluation and a methodological choice with important implications for the reliability of automatic judgments.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.05334

Country:

North America > United States (1.00)
Asia (1.00)
Europe (0.93)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation

Lu, Yen-Ju, Thebaud, Thomas, Moro-Velazquez, Laureano, Dehak, Najim, Villalba, Jesus

arXiv.org Artificial IntelligenceSep-30-2025

We present Paired by the Teacher (PbT), a two-stage teacher-student pipeline that synthesizes accurate input-output pairs without human labels or parallel data. In many low-resource natural language generation (NLG) scenarios, practitioners may have only raw outputs, like highlights, recaps, or questions, or only raw inputs, such as articles, dialogues, or paragraphs, but seldom both. This mismatch forces small models to learn from very few examples or rely on costly, broad-scope synthetic examples produced by large LLMs. PbT addresses this by asking a teacher LLM to compress each unpaired example into a concise intermediate representation (IR), and training a student to reconstruct inputs from IRs. This enables outputs to be paired with student-generated inputs, yielding high-quality synthetic data. We evaluate PbT on five benchmarks-document summarization (XSum, CNNDM), dialogue summarization (SAMSum, DialogSum), and question generation (SQuAD)-as well as an unpaired setting on SwitchBoard (paired with DialogSum summaries). An 8B student trained only on PbT data outperforms models trained on 70 B teacher-generated corpora and other unsupervised baselines, coming within 1.2 ROUGE-L of human-annotated pairs and closing 82% of the oracle gap at one-third the annotation cost of direct synthesis. Human evaluation on SwitchBoard further confirms that only PbT produces concise, faithful summaries aligned with the target style, highlighting its advantage of generating in-domain sources that avoid the mismatch, limiting direct synthesis.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.25144

Country:

North America > United States > Texas (0.46)
North America > United States > California (0.28)

Genre: Research Report (0.82)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization

Wang, Weixuan, Wu, Minghao, Haddow, Barry, Birch, Alexandra

arXiv.org Artificial IntelligenceSep-29-2025

Long document summarization remains a significant challenge for current large language models (LLMs), as existing approaches commonly struggle with information loss, factual inconsistencies, and coherence issues when processing excessively long documents. We propose SummQ, a novel adversarial multi-agent framework that addresses these limitations through collaborative intelligence between specialized agents operating in two complementary domains: summarization and quizzing. Our approach employs summary generators and reviewers that work collaboratively to create and evaluate comprehensive summaries, while quiz generators and reviewers create comprehension questions that serve as continuous quality checks for the summarization process. This adversarial dynamic, enhanced by an examinee agent that validates whether the generated summary contains the information needed to answer the quiz questions, enables iterative refinement through multifaceted feedback mechanisms. We evaluate SummQ on three widely used long document summarization benchmarks. Experimental results demonstrate that our framework significantly outperforms existing state-of-the-art methods across ROUGE and BERTScore metrics, as well as in LLM-as-a-Judge and human evaluations. Our comprehensive analyses reveal the effectiveness of the multi-agent collaboration dynamics, the influence of different agent configurations, and the impact of the quizzing mechanism. This work establishes a new approach for long document summarization that uses adversarial agentic collaboration to improve summarization quality.

computational linguistic, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.209

Country:

North America (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.46)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Education (0.50)
Law Enforcement & Public Safety (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Long document summarization using page specific target text alignment and distilling page importance

Devi, Pushpa, Agrawal, Ayush, Dubey, Ashutosh, Chowdary, C. Ravindranath

arXiv.org Artificial IntelligenceSep-23-2025

The rapid growth of textual data across news, legal, medical, and scientific domains is becoming a challenge for efficiently accessing and understanding large volumes of content. It is increasingly complex for users to consume and extract meaningful information efficiently. Thus, raising the need for summarization. Unlike short document summarization, long document abstractive summarization is resource-intensive, and very little literature is present in this direction. BART is a widely used efficient sequence-to-sequence (seq-to-seq) model. However, when it comes to summarizing long documents, the length of the context window limits its capabilities. We proposed a model called PTS (Page-specific Target-text alignment Summarization) that extends the seq-to-seq method for abstractive summarization by dividing the source document into several pages. PTS aligns each page with the relevant part of the target summary for better supervision. Partial summaries are generated for each page of the document. We proposed another model called PTSPI (Page-specific Target-text alignment Summarization with Page Importance), an extension to PTS where an additional layer is placed before merging the partial summaries into the final summary. This layer provides dynamic page weightage and explicit supervision to focus on the most informative pages. We performed experiments on the benchmark dataset and found that PTSPI outperformed the SOTA by 6.32\% in ROUGE-1 and 8.08\% in ROUGE-2 scores.

computational linguistic, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.16539

Country:

Europe (1.00)
Asia (0.69)
North America > United States > California (0.46)

Genre: Research Report (1.00)

Industry: Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Team HUMANE at AVeriTeC 2025: HerO 2 for Efficient Fact Verification

Yoon, Yejun, Jung, Jaeyoon, Yoon, Seunghyun, Park, Kunwoo

arXiv.org Artificial IntelligenceJul-16-2025

This paper presents HerO 2, Team HUMANE's system for the AVeriTeC shared task at the FEVER-25 workshop. HerO 2 is an enhanced version of HerO, the best-performing open-source model from the previous year's challenge. It improves evidence quality through document summarization and answer reformulation, optimizes veracity prediction via post-training quantization under computational constraints, and enhances overall system performance by integrating updated language model (LM) backbones. HerO 2 ranked second on the leaderboard while achieving the shortest runtime among the top three systems, demonstrating both high efficiency and strong potential for real-world fact verification. The code is available at https://github.com/ssu-humane/HerO2.

hero 2, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2507.11004

Country: North America > United States (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.97)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)

Add feedback

M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?

Yan, Haolong, Tan, Kaijun, Shen, Yeqing, Huang, Xin, Ge, Zheng, Zhang, Xiangyu, Li, Si, Jiang, Daxin

arXiv.org Artificial IntelligenceMar-27-2025

We investigate a critical yet under-explored question in Large Vision-Language Models (LVLMs): Do LVLMs genuinely comprehend interleaved image-text in the document? Existing document understanding benchmarks often assess LVLMs using question-answer formats, which are information-sparse and difficult to guarantee the coverage of long-range dependencies. To address this issue, we introduce a novel and challenging Multimodal Document Summarization Benchmark (M-DocSum-Bench), which comprises 500 high-quality arXiv papers, along with interleaved multimodal summaries aligned with human preferences. M-DocSum-Bench is a reference-based generation task and necessitates the generation of interleaved image-text summaries using provided reference images, thereby simultaneously evaluating capabilities in understanding, reasoning, localization, and summarization within complex multimodal document scenarios. To facilitate this benchmark, we develop an automated framework to construct summaries and propose a fine-grained evaluation method called M-DocEval. Moreover, we further develop a robust summarization baseline, i.e., M-DocSum-7B, by progressive two-stage training with diverse instruction and preference data. The extensive results on our M-DocSum-Bench reveal that the leading LVLMs struggle to maintain coherence and accurately integrate information within long and interleaved contexts, often exhibiting confusion between similar images and a lack of robustness. Notably, M-DocSum-7B achieves state-of-the-art performance compared to larger and closed-source models (including GPT-4o, Gemini Pro, Claude-3.5-Sonnet and Qwen2.5-VL-72B, etc.), demonstrating the potential of LVLMs for improved interleaved image-text understanding. The code, data, and models are available at https://github.com/stepfun-ai/M-DocSum-Bench.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.21839

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization

Zhong, Yang, Litman, Diane

arXiv.org Artificial IntelligenceFeb-10-2025

Detecting factual inconsistency for long document summarization remains challenging, given the complex structure of the source article and long summary length. In this work, we study factual inconsistency errors and connect them with a line of discourse analysis. We find that errors are more common in complex sentences and are associated with several discourse features. We propose a framework that decomposes long texts into discourse-inspired chunks and utilizes discourse information to better aggregate sentence-level scores predicted by natural language inference models. Our approach shows improved performance on top of different model baselines over several evaluation benchmarks, covering rich domains of texts, focusing on long document summarization. This underscores the significance of incorporating discourse features in developing models for scoring summaries for long document factual inconsistency.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.06185

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Canada > Ontario > Toronto (0.04)
North America > Dominican Republic (0.04)
(15 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.67)

Add feedback

HERA: Improving Long Document Summarization using Large Language Models with Context Packaging and Reordering

Li, Taiji, Chen, Hao, Yu, Fei, Zhang, Yin

arXiv.org Artificial IntelligenceFeb-1-2025

Despite the rapid growth of context length of large language models (LLMs) , LLMs still perform poorly in long document summarization. An important reason for this is that relevant information about an event is scattered throughout long documents, and the messy narrative order impairs the accurate understanding and utilization of LLMs for long documents. To address these issues, we propose a novel summary generation framework, called HERA. Specifically, we first segment a long document by its semantic structure and retrieve text segments about the same event, and finally reorder them to form the input context. We evaluate our approach on two long document summarization datasets. The experimental results show that HERA outperforms foundation models in ROUGE, BERTScore and faithfulness metrics, while HERA does not require additional fine-tuning and resources.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.00448

Country:

Europe > Spain > Galicia > Madrid (0.05)
Asia > Singapore (0.05)
North America > Canada > Ontario > Toronto (0.05)
(8 more...)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Sports > Soccer (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Retrieval-Augmented Generation with Graphs (GraphRAG)

Han, Haoyu, Wang, Yu, Shomer, Harry, Guo, Kai, Ding, Jiayuan, Lei, Yongjia, Halappanavar, Mahantesh, Rossi, Ryan A., Mukherjee, Subhabrata, Tang, Xianfeng, He, Qi, Hua, Zhigang, Long, Bo, Zhao, Tong, Shah, Neil, Javari, Amin, Xia, Yinglong, Tang, Jiliang

arXiv.org Artificial IntelligenceJan-8-2025

Retrieval-augmented generation (RAG) is a powerful technique that enhances downstream task execution by retrieving additional information, such as knowledge, skills, and tools from external sources. Graph, by its intrinsic "nodes connected by edges" nature, encodes massive heterogeneous and relational information, making it a golden resource for RAG in tremendous real-world applications. As a result, we have recently witnessed increasing attention on equipping RAG with Graph, i.e., GraphRAG. However, unlike conventional RAG, where the retriever, generator, and external data sources can be uniformly designed in the neural-embedding space, the uniqueness of graph-structured data, such as diverse-formatted and domain-specific relational knowledge, poses unique and significant challenges when designing GraphRAG for different domains. Given the broad applicability, the associated design challenges, and the recent surge in GraphRAG, a systematic and up-to-date survey of its key concepts and techniques is urgently desired. Following this motivation, we present a comprehensive and up-to-date survey on GraphRAG. Our survey first proposes a holistic GraphRAG framework by defining its key components, including query processor, retriever, organizer, generator, and data source. Furthermore, recognizing that graphs in different domains exhibit distinct relational patterns and require dedicated designs, we review GraphRAG techniques uniquely tailored to each domain. Finally, we discuss research challenges and brainstorm directions to inspire cross-disciplinary opportunities.

information and knowledge management, large language model, machine learning, (26 more...)

arXiv.org Artificial Intelligence

2501.00309

Country:

Asia (0.28)
Europe > United Kingdom > England (0.14)
South America > Peru (0.13)
(2 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(6 more...)

Add feedback

A Rhetorical Relations-Based Framework for Tailored Multimedia Document Summarization

Maredj, Azze-Eddine, Sadallah, Madjid

arXiv.org Artificial IntelligenceDec-26-2024

In the rapidly evolving landscape of digital content, the task of summarizing multimedia documents, which encompass textual, visual, and auditory elements, presents intricate challenges. These challenges include extracting pertinent information from diverse formats, maintaining the structural integrity and semantic coherence of the original content, and generating concise yet informative summaries. This paper introduces a novel framework for multimedia document summarization that capitalizes on the inherent structure of the document to craft coherent and succinct summaries. Central to this framework is the incorporation of a rhetorical structure for structural analysis, augmented by a graph-based representation to facilitate the extraction of pivotal information. Weighting algorithms are employed to assign significance values to document units, thereby enabling effective ranking and selection of relevant content. Furthermore, the framework is designed to accommodate user preferences and time constraints, ensuring the production of personalized and contextually relevant summaries. The summarization process is elaborately delineated, encompassing document specification, graph construction, unit weighting, and summary extraction, supported by illustrative examples and algorithmic elucidation. This proposed framework represents a significant advancement in automatic summarization, with broad potential applications across multimedia document processing, promising transformative impacts in the field.

multimedia document, representation, summarization, (16 more...)

arXiv.org Artificial Intelligence

2412.19133

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Dorset > Bournemouth (0.04)
Europe > Slovakia (0.04)

Genre:

Overview (0.67)
Research Report (0.64)

Industry: Government > Space Agency (0.48)

Technology:

Information Technology > Human Computer Interaction > Multimedia Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback