IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages

Adilazuarda, Muhammad Farid, Cahyawijaya, Samuel, Winata, Genta Indra, Fung, Pascale, Purwarianti, Ayu

Nov-21-2023–arXiv.org Artificial Intelligence

In addition, we explore Processing (NLP) have introduced an immense methods to improve the robustness of LMs to improvement in many aspects, including code-mixed text. Using our IndoRobusta-Shot, standardized benchmarks (Wilie et al., 2020; we perform adversarial training to improve the Cahyawijaya et al., 2021; Koto et al., 2020; Winata code-mixed robustness of LMs. We explore three et al., 2022), large pre-trained language model kinds of tuning strategies: 1) code-mix only, 2) (LM) (Wilie et al., 2020; Cahyawijaya et al., 2021; two-steps, and 3) joint training, and empirically Koto et al., 2020), and resource expansion covering search for the best strategy to improve the model local Indonesian languages (Tri Apriani, 2016; robustness on code-mixed data.

computational linguistic, proceedings, robustness, (12 more...)

arXiv.org Artificial Intelligence

Nov-21-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America > United States
  - New Mexico (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
- Europe
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
- Asia
  - China > Hong Kong (0.04)
  - East Asia (0.04)
  - Indonesia > Java
    - East Java (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology
  - Communications > Social Media (0.94)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Neural Networks (0.46)