Lost in AI transcription: Adult words creep into YouTube children's videos
It happens when Google Speech-To-Text and Amazon Transcribe, both popular automatic speech recognition (ASR) systems, erroneously give such age-inappropriate subtitles on YouTube videos for children. This is the key finding of a study titled'Beach to bitch: Inadvertent Unsafe Transcription of Kids Content on YouTube' which covered 7,013 videos from 24 YouTube channels. Ten per cent of these videos contained at least one "highly inappropriate taboo word" for children, says US-based Ashique KhudaBukhsh, an assistant professor at Rochester Institute of Technology's software engineering department. KhudaBukhsh, assistant professor Sumeet Kumar of Indian School of Business in Hyderabad and Krithika Ramesh of Manipal University, who conducted the study, have termed the phenomenon "inappropriate content hallucination". "We were mind-boggled because we knew that these channels were watched by millions of children. We understand this is an important problem because it is telling us that the inappropriate content may not be present in the source but it can be introduced by a downstream AI (Artificial Intelligence) application. So on the broader philosophical level, people generally have checks and balances for the source, but now we have to be more vigilant about having checks and balances if an AI application modifies the source. It can inadvertently introduce inappropriate content," KhudaBukhsh, who has a PhD in machine learning and is from Kalyani in West Bengal, told The Sunday Express.
Apr-3-2022, 05:50:09 GMT