AITopics | summary sentence

Collaborating Authors

summary sentence

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Where did you get that? Towards Summarization Attribution for Analysts

B, Violet, Conroy, John M., Lynch, Sean, M, Danielle, Molino, Neil P., Wiechmann, Aaron, Yang, Julia S.

arXiv.org Artificial IntelligenceNov-13-2025

Analysts require attribution, as nothing can be reported without knowing the source of the information. In this paper, we will focus on automatic methods for attribution, linking each sentence in the summary to a portion of the source text, which may be in one or more documents. We explore using a hybrid summarization, i.e., an automatic paraphrase of an extractive summary, to ease attribution. We also use a custom topology to identify the proportion of different categories of attribution-related errors.

attribution, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.08589

Country:

North America > United States (1.00)
Asia (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Law Enforcement & Public Safety (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Long document summarization using page specific target text alignment and distilling page importance

Devi, Pushpa, Agrawal, Ayush, Dubey, Ashutosh, Chowdary, C. Ravindranath

arXiv.org Artificial IntelligenceSep-23-2025

The rapid growth of textual data across news, legal, medical, and scientific domains is becoming a challenge for efficiently accessing and understanding large volumes of content. It is increasingly complex for users to consume and extract meaningful information efficiently. Thus, raising the need for summarization. Unlike short document summarization, long document abstractive summarization is resource-intensive, and very little literature is present in this direction. BART is a widely used efficient sequence-to-sequence (seq-to-seq) model. However, when it comes to summarizing long documents, the length of the context window limits its capabilities. We proposed a model called PTS (Page-specific Target-text alignment Summarization) that extends the seq-to-seq method for abstractive summarization by dividing the source document into several pages. PTS aligns each page with the relevant part of the target summary for better supervision. Partial summaries are generated for each page of the document. We proposed another model called PTSPI (Page-specific Target-text alignment Summarization with Page Importance), an extension to PTS where an additional layer is placed before merging the partial summaries into the final summary. This layer provides dynamic page weightage and explicit supervision to focus on the most informative pages. We performed experiments on the benchmark dataset and found that PTSPI outperformed the SOTA by 6.32\% in ROUGE-1 and 8.08\% in ROUGE-2 scores.

computational linguistic, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.16539

Country:

Europe (1.00)
Asia (0.69)
North America > United States > California (0.46)

Genre: Research Report (1.00)

Industry: Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Towards a Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length Contexts

Deng, Jiaqi, Lee, Yuho, Kim, Nicole Hee-Yeon, Min, Hyangsuk, Yun, Taewon, Ban, Minjeong, Yul, Kim, Song, Hwanjun

arXiv.org Artificial IntelligenceAug-28-2025

We introduce HAMLET, a holistic and automated framework for evaluating the long-context comprehension of large language models (LLMs). HAMLET structures source texts into a three-level key-fact hierarchy at root-, branch-, and leaf-levels, and employs query-focused summarization to evaluate how well models recall and faithfully represent information at each level. To validate the reliability of our fully automated pipeline, we conduct a systematic human study, showing that our automatic evaluation achieves over 90% agreement with expert human judgments, while reducing the cost by up to 25 times. HAMLET reveals that LLMs struggle with fine-grained comprehension, especially at the leaf level, and are sensitive to positional effects like the lost-in-the-middle. Analytical queries pose greater challenges than narrative ones, and consistent performance gaps emerge between open-source and proprietary models, as well as across model scales. Our code and dataset are publicly available at https://github.com/DISL-Lab/HAMLET.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2508.19578

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Abstract Meaning Representation for Hospital Discharge Summarization

Landes, Paul, Rao, Sitara, Chaise, Aaron Jeremy, Di Eugenio, Barbara

arXiv.org Artificial IntelligenceJun-18-2025

The Achilles heel of Large Language Models (LLMs) is hallucination, which has drastic consequences for the clinical domain. This is particularly important with regards to automatically generating discharge summaries (a lengthy medical document that summarizes a hospital in-patient visit). Automatically generating these summaries would free physicians to care for patients and reduce documentation burden. The goal of this work is to discover new methods that combine language-based graphs and deep learning models to address provenance of content and trustworthiness in automatic summarization. Our method shows impressive reliability results on the publicly available Medical Information Mart for Intensive III (MIMIC-III) corpus and clinical notes written by physicians at Anonymous Hospital. rovide our method, generated discharge ary output examples, source code and trained models.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.14101

Country: North America > United States > Illinois (0.15)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.93)
Health & Medicine > Health Care Technology > Medical Record (0.68)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

Add feedback

Towards Multi-dimensional Evaluation of LLM Summarization across Domains and Languages

Min, Hyangsuk, Lee, Yuho, Ban, Minjeong, Deng, Jiaqi, Kim, Nicole Hee-Yeon, Yun, Taewon, Su, Hang, Cai, Jason, Song, Hwanjun

arXiv.org Artificial IntelligenceJun-3-2025

Evaluation frameworks for text summarization have evolved in terms of both domain coverage and metrics. However, existing benchmarks still lack domain-specific assessment criteria, remain predominantly English-centric, and face challenges with human annotation due to the complexity of reasoning. To address these, we introduce MSumBench, which provides a multi-dimensional, multi-domain evaluation of summarization in English and Chinese. It also incorporates specialized assessment criteria for each domain and leverages a multi-agent debate system to enhance annotation quality. By evaluating eight modern summarization models, we discover distinct performance patterns across domains and languages. We further examine large language models as summary evaluators, analyzing the correlation between their evaluation and summarization capabilities, and uncovering systematic bias in their assessment of self-generated summaries. Our benchmark dataset is publicly available at https://github.com/DISL-Lab/MSumBench.

large language model, machine learning, reference document, (23 more...)

arXiv.org Artificial Intelligence

2506.00549

Country: North America > United States (0.67)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Government > Regional Government > North America Government > United States Government (0.67)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback

Yun, Taewon, Oh, Jihwan, Min, Hyangsuk, Lee, Yuho, Bang, Jihwan, Cai, Jason, Song, Hwanjun

arXiv.org Artificial IntelligenceMar-27-2025

Summarization refinement faces challenges when extending to multi-dimension. In this paper, we introduce ReFeed, a powerful summarization refinement pipeline that enhances multiple dimensions through reflective reasoning on feedback. To achieve this, we release SumFeed-CoT, a large-scale Long-CoT-based dataset optimized for training a lightweight model with reflective reasoning. Our experiments reveal how the number of dimensions, feedback exposure, and reasoning policy influence refinement performance, highlighting reflective reasoning and simultaneously addressing multiple feedback is crucial to mitigate trade-off between dimensions. Furthermore, ReFeed is robust to noisy feedback and feedback order. Lastly, our finding emphasizes that creating data with a proper goal and guideline constitutes a fundamental pillar of effective reasoning. The dataset and model will be released.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2503.21332

Country:

North America > United States (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation

Koupaee, Mahnaz, Vincent, Jake W., Mansour, Saab, Shalyminov, Igor, He, Han, Song, Hwanjun, Shu, Raphael, He, Jianfeng, Nian, Yi, Wong, Amy Wing-mei, Han, Kyu J., Su, Hang

arXiv.org Artificial IntelligenceFeb-13-2025

Faithfulness evaluators based on large language models (LLMs) are often fooled by the fluency of the text and struggle with identifying errors in the summaries. We propose an approach to summary faithfulness evaluation in which multiple LLM-based agents are assigned initial stances (regardless of what their belief might be) and forced to come up with a reason to justify the imposed belief, thus engaging in a multi-round debate to reach an agreement. The uniformly distributed initial assignments result in a greater diversity of stances leading to more meaningful debates and ultimately more errors identified. Furthermore, by analyzing the recent faithfulness evaluation datasets, we observe that naturally, it is not always the case for a summary to be either faithful to the source document or not. We therefore introduce a new dimension, ambiguity, and a detailed taxonomy to identify such special cases. Experiments demonstrate our approach can help identify ambiguities, and have even a stronger performance on non-ambiguous summaries.

large language model, machine learning, source document, (18 more...)

arXiv.org Artificial Intelligence

2502.08514

Country:

North America > United States (1.00)
Europe > Germany (0.93)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Energy (1.00)
Education (1.00)
(13 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.45)

Add feedback

Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization

Zhong, Yang, Litman, Diane

arXiv.org Artificial IntelligenceFeb-10-2025

Detecting factual inconsistency for long document summarization remains challenging, given the complex structure of the source article and long summary length. In this work, we study factual inconsistency errors and connect them with a line of discourse analysis. We find that errors are more common in complex sentences and are associated with several discourse features. We propose a framework that decomposes long texts into discourse-inspired chunks and utilizes discourse information to better aggregate sentence-level scores predicted by natural language inference models. Our approach shows improved performance on top of different model baselines over several evaluation benchmarks, covering rich domains of texts, focusing on long document summarization. This underscores the significance of incorporating discourse features in developing models for scoring summaries for long document factual inconsistency.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.06185

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Canada > Ontario > Toronto (0.04)
North America > Dominican Republic (0.04)
(15 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.67)

Add feedback

Learning to Verify Summary Facts with Fine-Grained LLM Feedback

Oh, Jihwan, Choi, Jeonghwan, Kim, Nicole Hee-Yeon, Yun, Taewon, Song, Hwanjun

arXiv.org Artificial IntelligenceDec-14-2024

Training automatic summary fact verifiers often faces the challenge of a lack of human-labeled data. In this paper, we explore alternative way of leveraging Large Language Model (LLM) generated feedback to address the inherent limitation of using human-labeled data. We introduce FineSumFact, a large-scale dataset containing fine-grained factual feedback on summaries. We employ 10 distinct LLMs for diverse summary generation and Llama-3-70B-Instruct for feedback. We utilize this dataset to fine-tune the lightweight open-source model Llama-3-8B-Instruct, optimizing resource efficiency while maintaining high performance. Our experimental results reveal that the model trained on extensive LLM-generated datasets surpasses that trained on smaller human-annotated datasets when evaluated using human-generated test sets. Fine-tuning fact verification models with LLM feedback can be more effective and cost-efficient than using human feedback. The dataset is available at https://github.com/DISL-Lab/FineSumFact.

category, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.10689

Country: Europe > United Kingdom (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Coverage-based Fairness in Multi-document Summarization

Li, Haoyuan, Zhang, Yusen, Zhang, Rui, Chaturvedi, Snigdha

arXiv.org Artificial IntelligenceDec-11-2024

Fairness in multi-document summarization (MDS) measures whether a system can generate a summary fairly representing information from documents with different social attribute values. Fairness in MDS is crucial since a fair summary can offer readers a comprehensive view. Previous works focus on quantifying summary-level fairness using Proportional Representation, a fairness measure based on Statistical Parity. However, Proportional Representation does not consider redundancy in input documents and overlooks corpus-level unfairness. In this work, we propose a new summary-level fairness measure, Equal Coverage, which is based on coverage of documents with different social attribute values and considers the redundancy within documents. To detect the corpus-level unfairness, we propose a new corpus-level measure, Coverage Parity. Our human evaluations show that our measures align more with our definition of fairness. Using our measures, we evaluate the fairness of thirteen different LLMs. We find that Claude3-sonnet is the fairest among all evaluated LLMs. We also find that almost all LLMs overrepresent different social attribute values.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.08795

Country:

North America > United States > New Jersey (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania (0.04)
(11 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law (1.00)
Health & Medicine (0.69)
Government > Regional Government > North America Government > United States Government (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback