Goto

Collaborating Authors

 parakeet


Parakeets teach a lesson in friendship

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. Making new friends (especially as an adult) can be challenging. When new birds are introduced to a group, monk parakeets will "test the waters" to avoid getting injured by defensive strangers. The parakeets will gradually approach the new bird, taking some time to get familiar before ramping up to more risky or vulnerable interactions that are needed to form the bonds necessary for survival. "There can be a lot of benefits to being social, but these friendships have to start somewhere," said Claire O'Connell, a study co-author and a doctoral student in the University of Cincinnati, said in a statement .


Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People

Zhou, Haoshuai, Cao, Boxuan, Mo, Changgeng, Li, Linkai, Wang, Shan Xiang

arXiv.org Artificial Intelligence

Speech foundation models (SFMs) have demonstrated strong performance across a variety of downstream tasks, including speech intelligibility prediction for hearing-impaired people (SIP-HI). However, optimizing SFMs for SIP-HI has been insufficiently explored. In this paper, we conduct a comprehensive study to identify key design factors affecting SIP-HI performance with 5 SFMs, focusing on encoder layer selection, prediction head architecture, and ensemble configurations. Our findings show that, contrary to traditional use-all-layers methods, selecting a single encoder layer yields better results. Additionally, temporal modeling is crucial for effective prediction heads. We also demonstrate that ensembling multiple SFMs improves performance, with stronger individual models providing greater benefit. Finally, we explore the relationship between key SFM attributes and their impact on SIP-HI performance. Our study offers practical insights into effectively adapting SFMs for speech intelligibility prediction for hearing-impaired populations.


We finally know how parrots 'talk'

Popular Science

Parrots are so adept at mimicking people that the avian moniker has become synonymous with repetition. Yet for as long as we've known about the birds' incredible ability for impressions, how they manage such complex and flexible vocalizations has been a mystery. A new study offers a piece of the puzzle by peeking into the parakeet brain, and finds remarkable similarities to the human neural region that controls speech. The research, published March 19 in the journal Nature, suggests parrots (and specifically parakeets) could be a model for studying human speech, helping scientists to better understand and treat speech disorders. It also adds to the growing stack of scientific findings that demonstrate "bird-brained" isn't much of an insult after all.


Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition

Cornell, Samuele, Darefsky, Jordan, Duan, Zhiyao, Watanabe, Shinji

arXiv.org Artificial Intelligence

Currently, a common approach in many speech processing tasks is to leverage large scale pre-trained models by fine-tuning them on in-domain data for a particular application. Yet obtaining even a small amount of such data can be problematic, especially for sensitive domains and conversational speech scenarios, due to both privacy issues and annotation costs. To address this, synthetic data generation using single speaker datasets has been employed. Yet, for multi-speaker cases, such an approach often requires extensive manual effort and is prone to domain mismatches. In this work, we propose a synthetic data generation pipeline for multi-speaker conversational ASR, leveraging a large language model (LLM) for content creation and a conversational multi-speaker text-to-speech (TTS) model for speech synthesis. We conduct evaluation by fine-tuning the Whisper ASR model for telephone and distant conversational speech settings, using both in-domain data and generated synthetic data. Our results show that the proposed method is able to significantly outperform classical multi-speaker generation approaches that use external, non-conversational speech datasets.


Intelligently Aiding Human-Guided Correction of Speech Recognition

Vertanen, Keith (University of Cambridge) | Kristensson, Per Ola (University of Cambridge)

AAAI Conferences

Correcting recognition errors is often necessary in a speech interface. These errors not only reduce users' overall entry rate, but can also lead to frustration. While making fewer recognition errors is undoubtedly helpful, facilities for supporting user-guided correction are also critical. We explore how to better support user corrections using Parakeet — a continuous speech recognition system for mobile touch-screen devices. Parakeet's interface is designed for easy error correction on a handheld device. Users correct errors by selecting alternative words from a word confusion network and by typing on a predictive software keyboard. Our interface design was guided by computational experiments and used a variety of information sources to aid the correction process. In user studies, participants were able to write text effectively despite sometimes high initial recognition error rates. Using Parakeet as an example, we discuss principles we found were important for building an effective speech correction interface.