Improvement in Sign Language Translation Using Text CTC Alignment

Tan, Sihan, Miyazaki, Taro, Khan, Nabeela, Nakadai, Kazuhiro

Dec-24-2024–arXiv.org Artificial Intelligence

Current sign language translation (SLT) approaches often rely on gloss-based supervision with Connectionist Temporal Classification (CTC), limiting their ability to handle non-monotonic alignments between sign language video and spoken text. In this work, we propose a novel method combining joint CTC/Attention and transfer learning. The joint CTC/Attention introduces hierarchical encoding and integrates CTC with the attention mechanism during decoding, effectively managing both monotonic and non-monotonic alignments. Meanwhile, transfer learning helps bridge the modality gap between vision and language in SLT. Experimental results on two widely adopted benchmarks, RWTH-PHOENIX-Weather 2014 T and CSL-Daily, show that our method achieves results comparable to state-of-the-art and outperforms the pure-attention baseline. Additionally, this work opens a new door for future research into gloss-free SLT using text-based CTC alignment.

machine learning, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

Dec-24-2024

arXiv.org PDF

Add feedback

Country:
- Asia
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
  - Singapore (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
- Europe
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.04)
  - Germany > Berlin (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Spain (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
- North America > Canada
  - Ontario > Toronto (0.04)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:
- Research Report > Promising Solution (0.34)

Industry:
- Education > Curriculum > Subject-Specific Education (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language > Machine Translation (1.00)