In the fight against COVID-19, several artificial intelligence labs are turning to an unexpected piece of evidence that might help diagnose the illness: people's voices. A team of researchers from Harvard and MIT is using machine learning to comb through voice recordings from COVID-19 patients and healthy people in an attempt to identify specific vocal signatures that could indicate someone is carrying the virus. A similar project is underway at Carnegie Mellon University's CyLab. Research is still in early stages, but the teams aim to develop AI tools that could tell people whether they have coronavirus based on an audio recording of their voice. If proven successful, the tools could allow more people to choose to self-isolate even if they don't have access to a COVID-19 test.
Lee, Sang-Woo, Jung, Hyunhoon, Ko, SukHyun, Kim, Sunyoung, Kim, Hyewon, Doh, Kyoungtae, Park, Hyunjung, Yeo, Joseph, Ok, Sang-Houn, Lee, Joonhaeng, Lim, Sungsoon, Jeong, Minyoung, Choi, Seongjae, Hwang, SeungTae, Park, Eun-Young, Ma, Gwang-Ja, Han, Seok-Joo, Cha, Kwang-Seung, Sung, Nako, Ha, Jung-Woo
Tracking suspected cases of COVID-19 is crucial to suppressing the spread of COVID-19 pandemic. Active monitoring and proactive inspection are indispensable to mitigate COVID-19 spread, though these require considerable social and economic expense. To address this issue, we introduce CareCall, a call-based dialog agent which is deployed for active monitoring in Korea and Japan. We describe our system with a case study with statistics to show how the system works. Finally, we discuss a simple idea which uses CareCall to support proactive inspection.
In a world where new technology emerges at exponential rates, and our daily lives are increasingly mediated by speakers and sound waves, text to speech technology is the latest force evolving the way we communicate. Text to speech technology refers to a field of computer science that enables the conversion of language text into audible speech. Also known as voice computing, text to speech (TTS) often involves building a database of recorded human speech to train a computer to produce sound waves that resemble the natural sound of a human speaking. This process is called speech synthesis. The technology is trailblazing and major breakthroughs in the field occur regularly.
Charles Marmar has been a psychiatrist for 40 years, but when a combat veteran steps into his office for an evaluation, he still can't diagnose post-traumatic stress disorder with 100 percent accuracy. "You would think that if a war fighter came into my office I'd be able to decide if they have PTSD or not. But what if they're ashamed to tell me about their problems or they don't want to lose their high-security clearance, or I ask them about their disturbing dreams and they say they're sleeping well?" says Marmar. Marmar, who is chairman of the department of psychiatry at New York University's Langone Medical Center, is hoping to find answers in their speech. Voice samples are a rich source of information about a person's health, and researchers think subtle vocal cues may indicate underlying medical conditions or gauge disease risk.
Researchers at Duke Kunshan University, Wuhan University, Lenovo, and Sun Yat-sen University in Guangzhou claim to have developed an AI system that detects whether a person is wearing a mask from the sound of their muffled speech. They say that in experiments, it achieves 78.8% accuracy on one metric, demonstrating that sound could be a useful means of enforcing mask-wearing during the pandemic. The team's work is a submission to the 11th annual Computational Paralinguistics Challenge (ComParE) at the upcoming Interspeech 2020 conference, an open challenge dealing with the states and traits of speakers as manifested in their speech. This year saw the introduction of a "mask sub-challenge" in which the goal is to develop algorithms capable of determining whether a person is wearing a mask from the sound of their voice. For the sub-challenge, every competitor -- the coauthors of this study included -- must use the same corpus of 32 German speakers recorded for 10 hours in an audio studio wearing Lohmann & Rauscher face coverings.