L3Cube-MahaSum: A Comprehensive Dataset and BART Models for Abstractive Text Summarization in Marathi

Deshmukh, Pranita, Kulkarni, Nikita, Kulkarni, Sanhita, Manghani, Kareena, Joshi, Raviraj

Oct-11-2024–arXiv.org Artificial Intelligence

We present the MahaSUM dataset, a large-scale collection of diverse news articles in Marathi, designed to facilitate the training and evaluation of models for abstractive summarization tas ks in Indic languages. The dataset, containing 25k samples, was create d by scraping articles from a wide range of online news sources and manuall y verifying the abstract summaries. Additionally, we train an IndicBAR T model, a variant of the BART model tailored for Indic languages, usin g the Maha-SUM dataset. We evaluate the performance of our trained mode ls on the task of abstractive summarization and demonstrate their eff ectiveness in producing high-quality summaries in Marathi. Our work cont ributes to the advancement of natural language processing research in Indic languages and provides a valuable resource for future research in this area using state-of-the-art models.

dataset, marathi, summarization, (13 more...)

arXiv.org Artificial Intelligence

Oct-11-2024

arXiv.org PDF

Add feedback

Country:
- Asia > India
  - Maharashtra > Pune (0.05)
  - Tamil Nadu > Chennai (0.04)

Genre:
- Research Report
  - New Finding (0.46)
  - Promising Solution (0.34)

Industry:
- Education (0.46)
- Media > News (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found