TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
–Neural Information Processing Systems
Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences. Sign videos consist of continuous sequences of sign gestures with no clear boundaries in between. Existing SLT models usually represent sign visual features in a frame-wise manner so as to avoid needing to explicitly segmenting the videos into isolated signs. However, these methods neglect the temporal information of signs and lead to substantial ambiguity in translation. In this paper, we explore the temporal semantic structures of sign videos to learn more discriminative features.
Neural Information Processing Systems
Oct-10-2024, 17:51:35 GMT