SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision
Hamidullah, Yasser, Yazdani, Shakib, Oguz, Cennet, van Genabith, Josef, España-Bonet, Cristina
–arXiv.org Artificial Intelligence
Sign language translation (SLT) is typically trained with text in a single spoken language, which limits scalability and cross-language generalization. Earlier approaches have replaced gloss supervision with text-based sentence embeddings, but up to now, these remain tied to a specific language and modality. In contrast, here we employ language-agnostic, multimodal embeddings trained on text and speech from multiple languages to supervise SLT, enabling direct multilingual translation. To address data scarcity, we propose a coupled augmentation method that combines multilingual target augmentations (i.e. translations into many languages) with video-level perturbations, improving model robustness. Experiments show consistent BLEURT gains over text-only sentence embedding supervision, with larger improvements in low-resource settings. Our results demonstrate that language-agnostic embedding supervision, combined with coupled augmentation, provides a scalable and semantically robust alternative to traditional SLT training.
arXiv.org Artificial Intelligence
Oct-23-2025
- Country:
- Asia
- Europe
- Austria > Vienna (0.14)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Bulgaria (0.04)
- Germany > Saarland
- Saarbrücken (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.14)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States > New Mexico
- Bernalillo County > Albuquerque (0.04)
- Canada > Ontario
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Education > Curriculum > Subject-Specific Education (0.67)
- Technology: