Markov-Enhanced Clustering for Long Document Summarization: Tackling the 'Lost in the Middle' Challenge with Large Language Models

Jun-24-2025–arXiv.org Artificial Intelligence

The rapid expansion of information from diverse sources has heightened the need for effective automatic text summarization, which condenses documents into shorter, coherent texts. Summarization methods generally fall into two categories: extractive, which selects key segments from the original text, and abstractive, which generates summaries by rephrasing the content coherently. Large language models have advanced the field of abstractive summarization, but they are resource-intensive and face significant challenges in retaining key information across lengthy documents, which we call being "lost in the middle". To address these issues, we propose a hybrid summarization approach that combines extractive and abstractive techniques. Our method splits the document into smaller text chunks, clusters their vector embeddings, generates a summary for each cluster that represents a key idea in the document, and constructs the final summary by relying on a Markov chain graph when selecting the semantic order of ideas.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jun-24-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Minnesota
    - Hennepin County > Minneapolis (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Asia > Middle East
  - UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Africa > Middle East
  - Tunisia > Tunis Governorate > Tunis (0.04)

Genre:
- Overview (0.69)
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found