Revealing the impact of synthetic native samples and multi-tasking strategies in Hindi-English code-mixed humour and sarcasm detection
Mazumder, Debajyoti, Kumar, Aakash, Patro, Jasabanta
–arXiv.org Artificial Intelligence
In this paper, we reported our experiments with various strategies to improve code-mixed humour and sarcasm detection. We did all of our experiments for Hindi-English code-mixed scenario, as we have the linguistic expertise for the same. We experimented with three approaches, namely (i) native sample mixing, (ii) multi-task learning (MTL), and (iii) prompting very large multilingual language models (VMLMs). In native sample mixing, we added monolingual task samples in code-mixed training sets. In MTL learning, we relied on native and code-mixed samples of a semantically related task (hate detection in our case). Finally, in our third approach, we evaluated the efficacy of VMLMs via few-shot context prompting. Some interesting findings we got are (i) adding native samples improved humor (raising the F1-score up to 6.76%) and sarcasm (raising the F1-score up to 8.64%) detection, (ii) training MLMs in an MTL framework boosted performance for both humour (raising the F1-score up to 10.67%) and sarcasm (increment up to 12.35% in F1-score) detection, and (iii) prompting VMLMs couldn't outperform the other approaches. Finally, our ablation studies and error analysis discovered the cases where our model is yet to improve. We provided our code for reproducibility.
arXiv.org Artificial Intelligence
Dec-17-2024
- Country:
- South America > Chile
- Oceania > Australia
- North America
- United States
- Michigan (0.04)
- Washington > King County
- Seattle (0.14)
- Texas > Uvalde County
- Uvalde (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- California
- San Diego County > San Diego (0.04)
- Los Angeles County > Los Angeles (0.04)
- Canada
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.14)
- United States
- Europe
- Greece (0.04)
- Belgium (0.04)
- Ireland (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Italy > Trentino-Alto Adige/Südtirol
- Trentino Province > Trento (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Asia
- Indonesia > Bali (0.04)
- Singapore (0.04)
- Vietnam > Hanoi
- Hanoi (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- UAE > Abu Dhabi Emirate
- Japan
- Kyūshū & Okinawa > Kyūshū
- Miyazaki Prefecture > Miyazaki (0.04)
- Honshū > Kansai
- Osaka Prefecture > Osaka (0.04)
- Kyūshū & Okinawa > Kyūshū
- India
- Madhya Pradesh > Bhopal (0.04)
- West Bengal > Kolkata (0.04)
- Telangana > Hyderabad (0.04)
- China
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (0.68)
- Research Report
- Industry:
- Education (0.68)
- Law Enforcement & Public Safety (0.46)
- Information Technology (0.46)
- Technology: