Automatic Summarization of Long Documents

Oct-8-2024–arXiv.org Artificial Intelligence

A vast amount of textual data is added to the internet daily, making utilization and interpretation of such data difficult and cumbersome. As a result, automatic text summarization is crucial for extracting relevant information, saving precious reading time. Although many transformer-based models excel in summarization, they are constrained by their input size, preventing them from processing texts longer than their context size. This study introduces three novel algorithms that allow any LLM to efficiently overcome its input size limitation, effectively utilizing its full potential without any architectural modifications. We test our algorithms on texts with more than 70,000 words, and our experiments show a significant increase in BERTScore with competitive ROUGE scores.

dataset, summarization, transformer, (14 more...)

arXiv.org Artificial Intelligence

Oct-8-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - New Mexico > Santa Fe County
    - Santa Fe (0.04)
  - Colorado > El Paso County
    - Colorado Springs (0.04)
- Europe
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Italy > Tuscany
    - Florence (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - India > Telangana (0.04)
  - China > Yunnan Province
    - Kunming (0.04)

Genre:
- Research Report (1.00)

Industry:
- Law (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found