Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions

Krasitskii, Mikhail, Kolesnikova, Olga, Hernandez, Liliana Chanona, Sidorov, Grigori, Gelbukh, Alexander

Mar-29-2025–arXiv.org Artificial Intelligence

The sentiment analysis task in Tamil-English code-mixed texts has been explored using advanced transformer-based models. Challenges from grammatical inconsistencies, orthographic variations, and phonetic ambiguities have been addressed. The limitations of existing datasets and annotation gaps have been examined, emphasizing the need for larger and more diverse corpora. Transformer architectures, including XLM-RoBERTa, mT5, IndicBERT, and RemBERT, have been evaluated in low-resource, code-mixed environments. Performance metrics have been analyzed, highlighting the effectiveness of specific models in handling multilingual sentiment classification. The findings suggest that further advancements in data augmentation, phonetic normalization, and hybrid modeling approaches are required to enhance accuracy. Future research directions for improving sentiment analysis in code-mixed texts have been proposed.

large language model, machine learning, mixed text, (17 more...)

arXiv.org Artificial Intelligence

Mar-29-2025

arXiv.org PDF

Add feedback

Country:
- South America (0.04)
- North America
  - Central America (0.04)
  - Mexico > Mexico City
    - Mexico City (0.04)
- Asia > Indonesia
  - Bali (0.06)

Genre:
- Research Report (0.84)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Information Extraction (1.00)
    - Discourse & Dialogue (1.00)
    - Text Processing (0.94)
    - Large Language Model (0.91)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found