L3Cube-MahaSent-MD: A Multi-domain Marathi Sentiment Analysis Dataset and Transformer Models

Pingle, Aabha, Vyawahare, Aditya, Joshi, Isha, Tangsali, Rahul, Joshi, Raviraj

Jun-24-2023–arXiv.org Artificial Intelligence

The exploration of sentiment analysis in low-resource languages, such as Marathi, has been limited due to the availability of suitable datasets. In this work, we present L3Cube-MahaSent-MD, a multi-domain Marathi sentiment analysis dataset, with four different domains - movie reviews, general tweets, TV show subtitles, and political tweets. The dataset consists of around 60,000 manually tagged samples covering 3 distinct sentiments - positive, negative, and neutral. We create a sub-dataset for each domain comprising 15k samples. The MahaSent-MD is the first comprehensive multi-domain sentiment analysis dataset within the Indic sentiment landscape. We fine-tune different monolingual and multilingual BERT models on these datasets and report the best accuracy with the MahaBERT model. We also present an extensive in-domain and cross-domain analysis thus highlighting the need for low-resource multi-domain datasets. The data and models are available at https://github.com/l3cube-pune/MarathiNLP .

artificial intelligence, natural language, social media, (15 more...)

arXiv.org Artificial Intelligence

Jun-24-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Oregon > Multnomah County
    - Portland (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
  - Hawaii > Honolulu County
    - Honolulu (0.04)
  - California > Santa Clara County
    - Stanford (0.04)
- Europe
  - Czechia > Prague (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
- Asia
  - Middle East > UAE
    - Dubai Emirate > Dubai (0.04)
  - India
    - Maharashtra > Pune (0.04)
    - Tamil Nadu > Chennai (0.04)

Genre:
- Research Report (0.64)

Industry:
- Leisure & Entertainment (1.00)
- Media > Film (0.93)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence > Natural Language
    - Information Extraction (1.00)
    - Discourse & Dialogue (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found