Improvement in Sign Language Translation Using Text CTC Alignment
Tan, Sihan, Miyazaki, Taro, Khan, Nabeela, Nakadai, Kazuhiro
–arXiv.org Artificial Intelligence
Current sign language translation (SLT) approaches often rely on gloss-based supervision with Connectionist Temporal Classification (CTC), limiting their ability to handle non-monotonic alignments between sign language video and spoken text. In this work, we propose a novel method combining joint CTC/Attention and transfer learning. The joint CTC/Attention introduces hierarchical encoding and integrates CTC with the attention mechanism during decoding, effectively managing both monotonic and non-monotonic alignments. Meanwhile, transfer learning helps bridge the modality gap between vision and language in SLT. Experimental results on two widely adopted benchmarks, RWTH-PHOENIX-Weather 2014 T and CSL-Daily, show that our method achieves results comparable to state-of-the-art and outperforms the pure-attention baseline. Additionally, this work opens a new door for future research into gloss-free SLT using text-based CTC alignment.
arXiv.org Artificial Intelligence
Dec-24-2024
- Country:
- Asia
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Germany > Berlin (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Spain (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Belgium > Brussels-Capital Region
- North America > Canada
- South America > Chile
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Education > Curriculum > Subject-Specific Education (0.88)
- Technology: