Two-stage Pipeline for Multilingual Dialect Detection
–arXiv.org Artificial Intelligence
Dialect Identification is a crucial task for localizing various Large Language Models. This paper outlines our approach to the VarDial 2023 shared task. Here we have to identify three or two dialects from three languages each which results in a 9-way classification for Track-1 and 6-way classification for Track-2 respectively. Our proposed approach consists of a two-stage system and outperforms other participants' systems and previous works in this domain. We achieve a score of 58.54% for Track-1 and 85.61% for Track-2. Our codebase is available publicly (https://github.com/ankit-vaidya19/EACL_VarDial2023).
arXiv.org Artificial Intelligence
Mar-28-2023
- Country:
- Asia
- Japan > Honshū
- Kansai > Osaka Prefecture > Osaka (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Japan > Honshū
- Europe
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Ukraine (0.05)
- Croatia > Dubrovnik-Neretva County
- North America > United States
- California > Los Angeles County
- Los Angeles (0.14)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- California > Los Angeles County
- South America > Brazil
- Rio Grande do Sul (0.04)
- Asia
- Genre:
- Research Report (0.50)
- Technology: