Skeleton-based sign language recognition using a dual-stream spatio-temporal dynamic graph convolutional network
Liu, Liangjin, Zheng, Haoyang, Zhu, Zhengzhong, Zhou, Pei
–arXiv.org Artificial Intelligence
Isolated Sign Language Recognition (ISLR) is challenged by gestures that are morphologically similar yet semantically distinct, a problem rooted in the complex interplay between hand shape and motion trajectory. Existing methods, often relying on a single reference frame, struggle to resolve this geometric ambiguity. This paper introduces Dual-SignLanguageNet (DSLNet), a dual-reference, dual-stream architecture that decouples and models gesture morphology and trajectory in separate, complementary coordinate systems. The architecture processes these streams through specialized networks: a topology-aware graph convolution models the view-invariant shape from a wrist-centric frame, while a Finsler geometry-based encoder captures the context-aware trajectory from a facial-centric frame. These features are then integrated via a geometry-driven optimal transport fusion mechanism. DSLNet sets a new state-of-the-art, achieving 93.70%, 89.97%, and 99.79% accuracy on the challenging WLASL-100, WLASL-300, and LSA64 datasets, respectively, with significantly fewer parameters than competing models.
arXiv.org Artificial Intelligence
Sep-19-2025
- Country:
- Asia > China > Sichuan Province > Chengdu (0.04)
- Genre:
- Research Report (0.50)
- Industry:
- Education > Curriculum > Subject-Specific Education (0.69)
- Technology: