Human-Centered Editable Speech-to-Sign-Language Generation via Streaming Conformer-Transformer and Resampling Hook
–arXiv.org Artificial Intelligence
Existing end-to-end sign-language animation systems suffer from low naturalness, limited facial/body expressivity, and no user control. We propose a human-centered, real-time speech-to-sign animation framework that integrates (1) a streaming Conformer encoder with an autoregressive Transformer-MDN decoder for synchronized upper-body and facial motion generation, (2) a transparent, editable JSON intermediate representation empowering deaf users and experts to inspect and modify each sign segment, and (3) a human-in-the-loop optimization loop that refines the model based on user edits and ratings. Deployed on Unity3D, our system achieves a 13 ms average frame-inference time and a 103 ms end-to-end latency on an RTX 4070. Our key contributions include the design of a JSON-centric editing mechanism for fine-grained sign-level personalization and the first application of an MDN-based feedback loop for continuous model adaptation. This combination establishes a generalizable, explainable AI paradigm for user-adaptive, low-latency multimodal systems. In studies with 20 deaf signers and 5 professional interpreters, we observe a +13 point SUS improvement, 6.7 point reduction in cognitive load, and significant gains in naturalness and trust (p $<$ .001) over baselines. This work establishes a scalable, explainable AI paradigm for accessible sign-language technologies.
arXiv.org Artificial Intelligence
Jun-25-2025
- Country:
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- North America > United States
- California
- Los Angeles County > Los Angeles (0.04)
- San Francisco County > San Francisco (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- California
- Europe > United Kingdom
- Genre:
- Questionnaire & Opinion Survey (0.93)
- Research Report
- Experimental Study (0.46)
- New Finding (0.46)
- Workflow (0.94)
- Industry:
- Education > Curriculum
- Subject-Specific Education (0.84)
- Health & Medicine (0.67)
- Education > Curriculum
- Technology:
- Information Technology > Artificial Intelligence
- Issues > Social & Ethical Issues (0.86)
- Machine Learning (1.00)
- Natural Language (1.00)
- Vision > Face Recognition (0.66)
- Information Technology > Artificial Intelligence