Deshmukh, Pranita
Long Range Named Entity Recognition for Marathi Documents
Deshmukh, Pranita, Kulkarni, Nikita, Kulkarni, Sanhita, Manghani, Kareena, Kale, Geetanjali, Joshi, Raviraj
The demand for sophisticated natural language processing (NLP) methods, particularly Named Entity Recognition (NER), has increased due to the exponential growth of Marathi-language digital content. In particular, NER is essential for recognizing distant entities and for arranging and understanding unstructured Marathi text data. With an emphasis on managing long-range entities, this paper offers a comprehensive analysis of current NER techniques designed for Marathi documents. It dives into current practices and investigates the BERT transformer model's potential for long-range Marathi NER. Along with analyzing the effectiveness of earlier methods, the report draws comparisons between NER in English literature and suggests adaptation strategies for Marathi literature. The paper discusses the difficulties caused by Marathi's particular linguistic traits and contextual subtleties while acknowledging NER's critical role in NLP. To conclude, this project is a major step forward in improving Marathi NER techniques, with potential wider applications across a range of NLP tasks and domains.
L3Cube-MahaSum: A Comprehensive Dataset and BART Models for Abstractive Text Summarization in Marathi
Deshmukh, Pranita, Kulkarni, Nikita, Kulkarni, Sanhita, Manghani, Kareena, Joshi, Raviraj
We present the MahaSUM dataset, a large-scale collection of diverse news articles in Marathi, designed to facilitate the training and evaluation of models for abstractive summarization tas ks in Indic languages. The dataset, containing 25k samples, was create d by scraping articles from a wide range of online news sources and manuall y verifying the abstract summaries. Additionally, we train an IndicBAR T model, a variant of the BART model tailored for Indic languages, usin g the Maha-SUM dataset. We evaluate the performance of our trained mode ls on the task of abstractive summarization and demonstrate their eff ectiveness in producing high-quality summaries in Marathi. Our work cont ributes to the advancement of natural language processing research in Indic languages and provides a valuable resource for future research in this area using state-of-the-art models.