ASL STEM Wiki: Dataset and Benchmark for Interpreting STEM Articles

Yin, Kayo, Singh, Chinmay, Minakov, Fyodor O., Milan, Vanessa, Daumé, Hal III, Zhang, Cyril, Lu, Alex X., Bragg, Danielle

arXiv.org Artificial Intelligence 

Deaf and hard-of-hearing (DHH) students face significant barriers in accessing science, technology, engineering, and mathematics (STEM) education, notably due to the scarcity of STEM resources in signed languages. To help address this, we introduce ASL STEM Wiki: a parallel corpus of 254 Wikipedia articles on STEM topics in English, interpreted into over 300 hours of American Sign Language (ASL). ASL STEM Wiki is the first continuous signing dataset focused on STEM, facilitating the development of AI resources for STEM education in ASL. We identify several use cases of ASL STEM Wiki with human-centered applications. Figure 1: One use case of ASL STEM Wiki is automatic For example, because this dataset sign suggestion. Given an English sentence and a video highlights the frequent use of fingerspelling for of its ASL interpretation, the model detects all clips of technical concepts, which inhibits DHH students' ASL that contains fingerspelling (FS). Then, given the ability to learn, we develop models to detected FS clip and the English sentence, the model identify fingerspelled words--which can later identifies which English phrase in the sentence is fingerspelled be used to query for appropriate ASL signs to in the clip. The English phrase can be used to suggest to interpreters.