AITopics | tspnet

Collaborating Authors

tspnet

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

Neural Information Processing SystemsDec-24-2025, 06:47:36 GMT

Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences. Sign videos consist of continuous sequences of sign gestures with no clear boundaries in between. Existing SLT models usually represent sign visual features in a frame-wise manner so as to avoid needing to explicitly segmenting the videos into isolated signs. However, these methods neglect the temporal information of signs and lead to substantial ambiguity in translation. In this paper, we explore the temporal semantic structures of sign videos to learn more discriminative features. To this end, we first present a novel sign video segment representation which takes into account multiple temporal granularities, thus alleviating the need for accurate video segmentation. Taking advantage of the proposed segment representation, we develop a novel hierarchical sign video feature learning method via a temporal semantic pyramid network, called TSPNet. Specifically, TSPNet introduces an inter-scale attention to evaluate and enhance local semantic consistency of sign segments and an intra-scale attention to resolve semantic ambiguity by using non-local video context. Experiments show that our TSPNet outperforms the state-of-the-art with significant improvements on the BLEU score (from 9.58 to 13.41) and ROUGE score (from 31.80 to 34.96) on the largest commonly used SLT dataset.

hierarchical feature learning, temporal semantic pyramid, tspnet, (8 more...)

Neural Information Processing Systems

Industry: Education > Curriculum > Subject-Specific Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.79)
Information Technology > Artificial Intelligence > Natural Language (0.62)

Add feedback

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation Dongxu Li

Neural Information Processing SystemsAug-15-2025, 01:38:24 GMT

Sign language translation (SL T) aims to interpret sign video sequences into text-based natural language sentences.

proceedings, representation, translation, (12 more...)

Neural Information Processing Systems

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > Canada (0.04)
Europe > Finland > Southwest Finland > Turku (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)

Industry:

Health & Medicine (1.00)
Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Review for NeurIPS paper: TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

Neural Information Processing SystemsJan-26-2025, 11:31:25 GMT

The reviewers were positive about the ideas in the paper and mostly debated the merits of the evaluation. For one they were not fully convinced about the arguments in the rebuttal about the differences between the sharpness of boundaries for action localization and sign language translation. For camera ready I would suggest better addressing this point, as well as comparing or justifying differences to "Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation", Camgoz et al, CVPR 2020. One final suggestion is to add results with one more video encoder in addition to I3D.

hierarchical feature learning, sign language translation, temporal semantic pyramid, (2 more...)

Neural Information Processing Systems

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)

Add feedback

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

Neural Information Processing SystemsOct-10-2024, 17:51:35 GMT

hierarchical feature learning, sign language translation, temporal semantic pyramid, (5 more...)

Neural Information Processing Systems

Industry: Education > Curriculum > Subject-Specific Education (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.75)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.64)

Add feedback

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

Li, Dongxu, Xu, Chenchen, Yu, Xin, Zhang, Kaihao, Swift, Ben, Suominen, Hanna, Li, Hongdong

arXiv.org Artificial IntelligenceOct-12-2020

Sign language translation (SLT) aims to interpret sign video sequences into textbased natural language sentences. Sign videos consist of continuous sequences of sign gestures with no clear boundaries in between. Existing SLT models usually represent sign visual features in a frame-wise manner so as to avoid needing to explicitly segmenting the videos into isolated signs. However, these methods neglect the temporal information of signs and lead to substantial ambiguity in translation. In this paper, we explore the temporal semantic structures of sign videos to learn more discriminative features. To this end, we first present a novel sign video segment representation which takes into account multiple temporal granularities, thus alleviating the need for accurate video segmentation. Taking advantage of the proposed segment representation, we develop a novel hierarchical sign video feature learning method via a temporal semantic pyramid network, called TSPNet. Specifically, TSPNet introduces an inter-scale attention to evaluate and enhance local semantic consistency of sign segments and an intra-scale attention to resolve semantic ambiguity by using non-local video context. Experiments show that our TSPNet outperforms the state-of-the-art with significant improvements on the BLEU score (from 9.58 to 13.41) and ROUGE score (from 31.80 to 34.96) on the largest commonly-used SLT dataset.

machine learning, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2010.05468

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Finland > Southwest Finland > Turku (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine (1.00)
Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback