language recognition
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hong Kong (0.04)
Pose-Based Sign Language Spotting via an End-to-End Encoder Architecture
Johnny, Samuel Ebimobowei, Guda, Blessed, Aaron, Emmanuel Enejo, Gueye, Assane
Automatic Sign Language Recognition (ASLR) has emerged as a vital field for bridging the gap between deaf and hearing communities. However, the problem of sign-to-sign retrieval or detecting a specific sign within a sequence of continuous signs remains largely unexplored. We define this novel task as Sign Language Spotting. In this paper, we present a first step toward sign language retrieval by addressing the challenge of detecting the presence or absence of a query sign video within a sentence-level gloss or sign video. Unlike conventional approaches that rely on intermediate gloss recognition or text-based matching, we propose an end-to-end model that directly operates on pose keypoints extracted from sign videos. Our architecture employs an encoder-only backbone with a binary classification head to determine whether the query sign appears within the target sequence. By focusing on pose representations instead of raw RGB frames, our method significantly reduces computational cost and mitigates visual noise. We evaluate our approach on the Word Presence Prediction dataset from the WSLP 2025 shared task, achieving 61.88\% accuracy and 60.00\% F1-score. These results demonstrate the effectiveness of our pose-based framework for Sign Language Spotting, establishing a strong foundation for future research in automatic sign language retrieval and verification. Code is available at https://github.com/EbimoJohnny/Pose-Based-Sign-Language-Spotting
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Africa > Rwanda > Kigali > Kigali (0.04)
A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition
Alishzade, Nigar, Abdullayeva, Gulchin
This study presents a systematic comparative analysis of recurrent and attention-based neural architectures for isolated sign language recognition. We implement and evaluate two representative models-ConvLSTM and Vanilla Transformer-on the Azerbaijani Sign Language Dataset (AzSLD) and the Word-Level American Sign Language (WLASL) dataset. Our results demonstrate that the attention-based Vanilla Transformer consistently outperforms the recurrent ConvLSTM in both Top-1 and Top-5 accuracy across datasets, achieving up to 76.8% Top-1 accuracy on AzSLD and 88.3% on WLASL. The ConvLSTM, while more computationally efficient, lags in recognition accuracy, particularly on smaller datasets. These findings highlight the complementary strengths of each paradigm: the Transformer excels in overall accuracy and signer independence, whereas the ConvLSTM offers advantages in computational efficiency and temporal modeling. The study provides a nuanced analysis of these trade-offs, offering guidance for architecture selection in sign language recognition systems depending on application requirements and resource constraints.
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Asia > India > Maharashtra > Pune (0.04)
- (3 more...)
RoCoISLR: A Romanian Corpus for Isolated Sign Language Recognition
Rîpanu, Cătălin-Alexandru, Hotnog, Andrei-Theodor, Imbrea, Giulia-Stefania, Cercel, Dumitru-Clementin
Automatic sign language recognition plays a crucial role in bridging the communication gap between deaf communities and hearing individuals; however, most available datasets focus on American Sign Language. For Romanian Isolated Sign Language Recognition (RoISLR), no large-scale, standardized dataset exists, which limits research progress. In this work, we introduce a new corpus for RoISLR, named RoCoISLR, comprising over 9,000 video samples that span nearly 6,000 standardized glosses from multiple sources. We establish benchmark results by evaluating seven state-of-the-art video recognition models-I3D, SlowFast, Swin Transformer, TimeSformer, Uniformer, VideoMAE, and PoseConv3D-under consistent experimental setups, and compare their performance with that of the widely used WLASL2000 corpus. According to the results, transformer-based architectures outperform convolutional baselines; Swin Transformer achieved a Top-1 accuracy of 34.1%. Our benchmarks highlight the challenges associated with long-tail class distributions in low-resource sign languages, and RoCoISLR provides the initial foundation for systematic RoISLR research.
- Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.05)
- North America > Mexico > Puebla (0.04)
Sign language recognition from skeletal data using graph and recurrent neural networks
Mederos, B., Mejía, J., Medina-Reyes, A., Espinosa-Almeyda, Y., Díaz-Roman, J. D., Rodríguez-Mederos, I., Mejía-Carreon, M., Gonzalez-Lopez, F.
Within this field, Isolated Sign Language Recognition (ISLR) focuses on recognizing single, context-free signs, which can be seen as the building blocks for continuous sign language understanding. While ISLR provides an environment to study the visual and temporal characteristics of signs, it also poses significant challenges due to intra-class variations, such as differences in signing speed, hand orientation, and signer-dependent styles. Addressing these challenges is essential for building reliable models that can generalize across diverse signers and recording conditions. Moreover, the recognition of isolated signs provides a basis for Continuous Sign Language Recognition (CSLR), where the boundaries between consecutive signs are not explicitly defined. ISLR is typically formulated as a classification problem, where each video segment corresponds to a single label representing a sign.
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)
Reconnaissance Automatique des Langues des Signes : Une Approche Hybridée CNN-LSTM Basée sur Mediapipe
Takouchouang, Fraisse Sacré, Vinh, Ho Tuong
Sign languages play a crucial role in the communication of deaf communities, but they are often marginalized, limiting access to essential services such as healthcare and education. This study proposes an automatic sign language recognition system based on a hybrid CNN-LSTM architecture, using Mediapipe for gesture keypoint extraction. Developed with Python, TensorFlow and Streamlit, the system provides real-time gesture translation. The results show an average accuracy of 92\%, with very good performance for distinct gestures such as ``Hello'' and ``Thank you''. However, some confusions remain for visually similar gestures, such as ``Call'' and ``Yes''. This work opens up interesting perspectives for applications in various fields such as healthcare, education and public services.
Skeleton-based sign language recognition using a dual-stream spatio-temporal dynamic graph convolutional network
Liu, Liangjin, Zheng, Haoyang, Zhu, Zhengzhong, Zhou, Pei
Isolated Sign Language Recognition (ISLR) is challenged by gestures that are morphologically similar yet semantically distinct, a problem rooted in the complex interplay between hand shape and motion trajectory. Existing methods, often relying on a single reference frame, struggle to resolve this geometric ambiguity. This paper introduces Dual-SignLanguageNet (DSLNet), a dual-reference, dual-stream architecture that decouples and models gesture morphology and trajectory in separate, complementary coordinate systems. The architecture processes these streams through specialized networks: a topology-aware graph convolution models the view-invariant shape from a wrist-centric frame, while a Finsler geometry-based encoder captures the context-aware trajectory from a facial-centric frame. These features are then integrated via a geometry-driven optimal transport fusion mechanism. DSLNet sets a new state-of-the-art, achieving 93.70%, 89.97%, and 99.79% accuracy on the challenging WLASL-100, WLASL-300, and LSA64 datasets, respectively, with significantly fewer parameters than competing models.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hong Kong (0.04)
A Signer-Invariant Conformer and Multi-Scale Fusion Transformer for Continuous Sign Language Recognition
Haque, Md Rezwanul, Islam, Md. Milon, Raju, S M Taslim Uddin, Karray, Fakhri
Continuous Sign Language Recognition (CSLR) faces multiple challenges, including significant inter-signer variability and poor generalization to novel sentence structures. Traditional solutions frequently fail to handle these issues efficiently. For overcoming these constraints, we propose a dual-architecture framework. For the Signer-Independent (SI) challenge, we propose a Signer-Invariant Conformer that combines convolutions with multi-head self-attention to learn robust, signer-agnostic representations from pose-based skeletal keypoints. For the Unseen-Sentences (US) task, we designed a Multi-Scale Fusion Transformer with a novel dual-path temporal encoder that captures both fine-grained posture dynamics, enabling the model's ability to comprehend novel grammatical compositions. Experiments on the challenging Isharah-1000 dataset establish a new standard for both CSLR benchmarks. The proposed conformer architecture achieves a Word Error Rate (WER) of 13.07% on the SI challenge, a reduction of 13.53% from the state-of-the-art. On the US task, the transformer model scores a WER of 47.78%, surpassing previous work. In the SignEval 2025 CSLR challenge, our team placed 2nd in the US task and 4th in the SI task, demonstrating the performance of these models. The findings validate our key hypothesis: that developing task-specific networks designed for the particular challenges of CSLR leads to considerable performance improvements and establishes a new baseline for further research. The source code is available at: https://github.com/rezwanh001/MSLR-Pose86K-CSLR-Isharah.
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Health & Medicine (0.93)
- Education > Curriculum > Subject-Specific Education (0.68)
9th Workshop on Sign Language Translation and Avatar Technologies (SLTAT 2025)
Nunnari, Fabrizio, Jiménez, Cristina Luna, Wolfe, Rosalee, McDonald, John C., Filhol, Michael, Efthimiou, Eleni, Fotinea, Evita, Hanke, Thomas
The Sign Language Translation and Avatar Technology (SLTAT) workshops continue a series of gatherings to share recent advances in improving deaf / human communication through non-invasive means. This 2025 edition, the 9th since its first appearance in 2011, is hosted by the International Conference on Intelligent Virtual Agents (IVA), giving the opportunity for contamination between two research communities, using digital humans as either virtual interpreters or as interactive conversational agents. As presented in this summary paper, SLTAT sees contributions beyond avatar technologies, with a consistent number of submissions on sign language recognition, and other work on data collection, data analysis, tools, ethics, usability, and affective computing.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > Canada (0.05)
- (11 more...)