Gloss2Text: Sign Language Gloss translation using LLMs and Semantically Aware Label Smoothing

Fayyazsanavi, Pooya, Anastasopoulos, Antonios, Košecká, Jana

arXiv.org Artificial Intelligence 

Sign language translation from video to spoken text presents unique challenges owing to the distinct grammar, expression nuances, and high variation of visual appearance across different speakers and contexts. The intermediate gloss annotations of videos aim to guide the translation process. In our work, we focus on Gloss2Text translation stage and propose several advances by leveraging pre-trained large language models (LLMs), data augmentation, and novel label-smoothing loss function exploiting gloss translation ambiguities Figure 1: An example of ambiguity in sign language is improving significantly the performance of demonstrated by the gloss "BEWOELKT (CLOUDY)," state-of-the-art approaches. Through extensive which is represented in multiple translations within the experiments and ablation studies on the dataset. As shown, ambiguity may share the same meaning PHOENIX Weather 2014T dataset, our approach but differ in form, such as "wolken (cloudy)," or surpasses state-of-the-art performance where the gloss represents the concept meaning, such in Gloss2Text translation, indicating its efficacy as "unbeständig (unstable)." in addressing sign language translation and suggesting promising avenues for future research and development.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found