AITopics | language recognition

Collaborating Authors

language recognition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ALittle Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Neural Information Processing SystemsJun-19-2026, 11:12:46 GMT

Recent theoretical results show transformers cannot express sequential reasoning problems over long inputs, intuitively because their computational depth is bounded. However, prior work treats the depth as a constant, leaving it unclear to what degree bounded depth may suffice for solving problems over short inputs, or how increasing the transformer's depth affects its expressive power. We address these questions by analyzing transformers whose depth can grow minimally with context length n. We show even highly uniform transformers with depth Θ(logn) can express two important problems: recognizing regular languages, which captures state tracking abilities and was known to be expressible only by an unconventional, non-uniform model of transformers, and graph connectivity, which underlies multistep reasoning. Notably, both of these problems cannot be expressed by fixed-depth transformers under standard complexity conjectures, demonstrating the expressivity benefit of growing depth. Moreover, our theory quantitatively predicts how depth must grow with input length to express these problems, showing that depth scaling is more efficient than scaling width or chain-of-thought steps. Empirically, our detailed experiments designed to bridge the expressivity vs. learnability gap reveal that our theoretical depth requirements for regular language recognition closely match the practical depth requirements for successfully training transformers. Thus, our results clarify how depth affects a transformer's reasoning capabilities, and provide practical guidance for effective depth selection for sequential reasoning.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

6cd3ac24cdb789beeaa9f7145670fcae-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 15:45:06 GMT

language recognition, recognition, sign language recognition, (15 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.71)

Add feedback

Pose-Based Sign Language Spotting via an End-to-End Encoder Architecture

Johnny, Samuel Ebimobowei, Guda, Blessed, Aaron, Emmanuel Enejo, Gueye, Assane

arXiv.org Artificial IntelligenceDec-10-2025

Automatic Sign Language Recognition (ASLR) has emerged as a vital field for bridging the gap between deaf and hearing communities. However, the problem of sign-to-sign retrieval or detecting a specific sign within a sequence of continuous signs remains largely unexplored. We define this novel task as Sign Language Spotting. In this paper, we present a first step toward sign language retrieval by addressing the challenge of detecting the presence or absence of a query sign video within a sentence-level gloss or sign video. Unlike conventional approaches that rely on intermediate gloss recognition or text-based matching, we propose an end-to-end model that directly operates on pose keypoints extracted from sign videos. Our architecture employs an encoder-only backbone with a binary classification head to determine whether the query sign appears within the target sequence. By focusing on pose representations instead of raw RGB frames, our method significantly reduces computational cost and mitigates visual noise. We evaluate our approach on the Word Presence Prediction dataset from the WSLP 2025 shared task, achieving 61.88\% accuracy and 60.00\% F1-score. These results demonstrate the effectiveness of our pose-based framework for Sign Language Spotting, establishing a strong foundation for future research in automatic sign language retrieval and verification. Code is available at https://github.com/EbimoJohnny/Pose-Based-Sign-Language-Spotting

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.08738

Country:

North America > United States (0.15)
Europe > Spain (0.14)
Africa > Rwanda (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition

Alishzade, Nigar, Abdullayeva, Gulchin

arXiv.org Artificial IntelligenceNov-18-2025

This study presents a systematic comparative analysis of recurrent and attention-based neural architectures for isolated sign language recognition. We implement and evaluate two representative models-ConvLSTM and Vanilla Transformer-on the Azerbaijani Sign Language Dataset (AzSLD) and the Word-Level American Sign Language (WLASL) dataset. Our results demonstrate that the attention-based Vanilla Transformer consistently outperforms the recurrent ConvLSTM in both Top-1 and Top-5 accuracy across datasets, achieving up to 76.8% Top-1 accuracy on AzSLD and 88.3% on WLASL. The ConvLSTM, while more computationally efficient, lags in recognition accuracy, particularly on smaller datasets. These findings highlight the complementary strengths of each paradigm: the Transformer excels in overall accuracy and signer independence, whereas the ConvLSTM offers advantages in computational efficiency and temporal modeling. The study provides a nuanced analysis of these trade-offs, offering guidance for architecture selection in sign language recognition systems depending on application requirements and resource constraints.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/PCI66488.2025.11219827

2511.13126

Country: Asia > Azerbaijan (0.29)

Genre: Research Report > New Finding (0.87)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)

Add feedback

RoCoISLR: A Romanian Corpus for Isolated Sign Language Recognition

Rîpanu, Cătălin-Alexandru, Hotnog, Andrei-Theodor, Imbrea, Giulia-Stefania, Cercel, Dumitru-Clementin

arXiv.org Artificial IntelligenceNov-18-2025

Automatic sign language recognition plays a crucial role in bridging the communication gap between deaf communities and hearing individuals; however, most available datasets focus on American Sign Language. For Romanian Isolated Sign Language Recognition (RoISLR), no large-scale, standardized dataset exists, which limits research progress. In this work, we introduce a new corpus for RoISLR, named RoCoISLR, comprising over 9,000 video samples that span nearly 6,000 standardized glosses from multiple sources. We establish benchmark results by evaluating seven state-of-the-art video recognition models-I3D, SlowFast, Swin Transformer, TimeSformer, Uniformer, VideoMAE, and PoseConv3D-under consistent experimental setups, and compare their performance with that of the widely used WLASL2000 corpus. According to the results, transformer-based architectures outperform convolutional baselines; Swin Transformer achieved a Top-1 accuracy of 34.1%. Our benchmarks highlight the challenges associated with long-tail class distributions in low-resource sign languages, and RoCoISLR provides the initial foundation for systematic RoISLR research.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.12767

Country: Europe > Romania (0.15)

Genre: Research Report (0.50)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)

Add feedback

Sign language recognition from skeletal data using graph and recurrent neural networks

Mederos, B., Mejía, J., Medina-Reyes, A., Espinosa-Almeyda, Y., Díaz-Roman, J. D., Rodríguez-Mederos, I., Mejía-Carreon, M., Gonzalez-Lopez, F.

arXiv.org Artificial IntelligenceNov-11-2025

Within this field, Isolated Sign Language Recognition (ISLR) focuses on recognizing single, context-free signs, which can be seen as the building blocks for continuous sign language understanding. While ISLR provides an environment to study the visual and temporal characteristics of signs, it also poses significant challenges due to intra-class variations, such as differences in signing speed, hand orientation, and signer-dependent styles. Addressing these challenges is essential for building reliable models that can generalize across diverse signers and recording conditions. Moreover, the recognition of isolated signs provides a basis for Continuous Sign Language Recognition (CSLR), where the boundaries between consecutive signs are not explicitly defined. ISLR is typically formulated as a classification problem, where each video segment corresponds to a single label representing a sign.

artificial intelligence, machine learning, recognition, (16 more...)

arXiv.org Artificial Intelligence

2511.05772

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reconnaissance Automatique des Langues des Signes : Une Approche Hybridée CNN-LSTM Basée sur Mediapipe

Takouchouang, Fraisse Sacré, Vinh, Ho Tuong

arXiv.org Artificial IntelligenceOct-28-2025

Sign languages play a crucial role in the communication of deaf communities, but they are often marginalized, limiting access to essential services such as healthcare and education. This study proposes an automatic sign language recognition system based on a hybrid CNN-LSTM architecture, using Mediapipe for gesture keypoint extraction. Developed with Python, TensorFlow and Streamlit, the system provides real-time gesture translation. The results show an average accuracy of 92\%, with very good performance for distinct gestures such as ``Hello'' and ``Thank you''. However, some confusions remain for visually similar gestures, such as ``Call'' and ``Yes''. This work opens up interesting perspectives for applications in various fields such as healthcare, education and public services.

artificial intelligence, langue, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.22011

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Skeleton-based sign language recognition using a dual-stream spatio-temporal dynamic graph convolutional network

Liu, Liangjin, Zheng, Haoyang, Zhu, Zhengzhong, Zhou, Pei

arXiv.org Artificial IntelligenceSep-19-2025

Isolated Sign Language Recognition (ISLR) is challenged by gestures that are morphologically similar yet semantically distinct, a problem rooted in the complex interplay between hand shape and motion trajectory. Existing methods, often relying on a single reference frame, struggle to resolve this geometric ambiguity. This paper introduces Dual-SignLanguageNet (DSLNet), a dual-reference, dual-stream architecture that decouples and models gesture morphology and trajectory in separate, complementary coordinate systems. The architecture processes these streams through specialized networks: a topology-aware graph convolution models the view-invariant shape from a wrist-centric frame, while a Finsler geometry-based encoder captures the context-aware trajectory from a facial-centric frame. These features are then integrated via a geometry-driven optimal transport fusion mechanism. DSLNet sets a new state-of-the-art, achieving 93.70%, 89.97%, and 99.79% accuracy on the challenging WLASL-100, WLASL-300, and LSA64 datasets, respectively, with significantly fewer parameters than competing models.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.08661

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Industry: Education > Curriculum > Subject-Specific Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.69)

Add feedback

6cd3ac24cdb789beeaa9f7145670fcae-Paper-Conference.pdf

Neural Information Processing SystemsAug-15-2025, 15:54:30 GMT

language recognition, recognition, sign language recognition, (15 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.71)

Add feedback

A Signer-Invariant Conformer and Multi-Scale Fusion Transformer for Continuous Sign Language Recognition

Haque, Md Rezwanul, Islam, Md. Milon, Raju, S M Taslim Uddin, Karray, Fakhri

arXiv.org Artificial IntelligenceAug-14-2025

Continuous Sign Language Recognition (CSLR) faces multiple challenges, including significant inter-signer variability and poor generalization to novel sentence structures. Traditional solutions frequently fail to handle these issues efficiently. For overcoming these constraints, we propose a dual-architecture framework. For the Signer-Independent (SI) challenge, we propose a Signer-Invariant Conformer that combines convolutions with multi-head self-attention to learn robust, signer-agnostic representations from pose-based skeletal keypoints. For the Unseen-Sentences (US) task, we designed a Multi-Scale Fusion Transformer with a novel dual-path temporal encoder that captures both fine-grained posture dynamics, enabling the model's ability to comprehend novel grammatical compositions. Experiments on the challenging Isharah-1000 dataset establish a new standard for both CSLR benchmarks. The proposed conformer architecture achieves a Word Error Rate (WER) of 13.07% on the SI challenge, a reduction of 13.53% from the state-of-the-art. On the US task, the transformer model scores a WER of 47.78%, surpassing previous work. In the SignEval 2025 CSLR challenge, our team placed 2nd in the US task and 4th in the SI task, demonstrating the performance of these models. The findings validate our key hypothesis: that developing task-specific networks designed for the particular challenges of CSLR leads to considerable performance improvements and establishes a new baseline for further research. The source code is available at: https://github.com/rezwanh001/MSLR-Pose86K-CSLR-Isharah.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.09372

Country: North America > United States (0.66)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.93)
Education > Curriculum > Subject-Specific Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback