Unsupervised Sentiment Analysis for Code-mixed Data
Yadav, Siddharth, Chakraborty, Tanmoy
–arXiv.org Artificial Intelligence
Code-mixing is the practice of alternating between two or more languages. Mostly observed in multilingual societies, its occurrence is increasing and therefore its importance. A major part of sentiment analysis research has been monolingual, and most of them perform poorly on code-mixed text. In this work, we introduce methods that use different kinds of multilingual and cross-lingual embeddings to efficiently transfer knowledge from monolingual text to code-mixed text for sentiment analysis of code-mixed text. Our methods can handle code-mixed text through a zero-shot learning. Our methods beat state-of-the-art on English-Spanish code-mixed sentiment analysis by absolute 3\% F1-score. We are able to achieve 0.58 F1-score (without parallel corpus) and 0.62 F1-score (with parallel corpus) on the same benchmark in a zero-shot way as compared to 0.68 F1-score in supervised settings. Our code is publicly available.
arXiv.org Artificial Intelligence
Jan-20-2020
- Country:
- South America > Paraguay
- Oceania > Australia
- North America
- Canada (0.04)
- United States
- Maryland > Baltimore (0.04)
- Texas > Travis County
- Austin (0.04)
- Colorado > Denver County
- Denver (0.04)
- Europe
- Sweden > Vaestra Goetaland
- Gothenburg (0.04)
- Spain > Region of Murcia
- Murcia (0.04)
- Italy > Tuscany
- Florence (0.05)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Sweden > Vaestra Goetaland
- Asia
- Indonesia > Bali (0.04)
- China > Hong Kong (0.04)
- Middle East > Qatar
- Japan
- Kyūshū & Okinawa > Kyūshū
- Miyazaki Prefecture > Miyazaki (0.04)
- Honshū > Kansai
- Osaka Prefecture > Osaka (0.04)
- Kyūshū & Okinawa > Kyūshū
- India
- West Bengal > Kolkata (0.04)
- NCT > Delhi (0.04)
- Genre:
- Research Report (0.40)
- Technology: