audio
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Health & Medicine (0.67)
- Media (0.46)
- North America > United States > Illinois (0.04)
- Asia > India (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (7 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- North America > Canada > Quebec > Montreal (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- North America > Canada > Quebec > Montreal (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
The overlooked driver of digital transformation
Clear, reliable audio is no longer optional, say Genevieve Juillard, CEO of IDC, and Chris Schyvinck, president and CEO at Shure. When business leaders talk about digital transformation, their focus often jumps straight to cloud platforms, AI tools, or collaboration software. Yet, one of the most fundamental enablers of how organizations now work, and how employees experience that work, is often overlooked: audio. As Genevieve Juillard, CEO of IDC, notes, the shift to hybrid collaboration made every space, from corporate boardrooms to kitchen tables, meeting-ready almost overnight. In the scramble, audio quality often lagged, creating what research now shows is more than a nuisance. Poor sound can alter how speakers are perceived, making them seem less credible or even less trustworthy. Audio is the gatekeeper of meaning," stresses Julliard. "If people can't hear clearly, they can't understand you. And if they can't understand you, they can't trust you, and they can't act on what you said. And no amount of sharp video can fix that. For Shure, which has spent a century advancing sound technology, the implications extend far beyond convenience.
- North America > United States > Massachusetts (0.04)
- Europe > United Kingdom > England > East Sussex > Brighton (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Health & Medicine (0.68)
- Information Technology (0.48)
Self-Supervised Visual Acoustic Matching
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment. Existing methods assume access to paired training data, where the audio is observed in both source and target environments, but this limits the diversity of training data or requires the use of simulated data or heuristics to create paired samples. We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio---without acoustically mismatched source audio for reference. Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric that quantifies the level of residual acoustic information in the de-biased audio. Training with either in-the-wild web data or simulated data, we demonstrate it outperforms the state-of-the-art on multiple challenging datasets and a wide variety of real-world audio and environments.
Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking
Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing advantages and possibly unlock this impasse. However, given the safety-critical nature of healthcare applications, it is pivotal to also ensure openness and replicability for any proposed foundation model solution. To this end, we introduce OPERA, an OPEn Respiratory Acoustic foundation model pretraining and benchmarking system, as the first approach answering this need. We curate large-scale respiratory audio datasets ($\sim$136K samples, over 400 hours), pretrain three pioneering foundation models, and build a benchmark consisting of 19 downstream respiratory health tasks for evaluation. Our pretrained models demonstrate superior performance (against existing acoustic models pretrained with general audio on 16 out of 19 tasks) and generalizability (to unseen datasets and new respiratory audio modalities). This highlights the great promise of respiratory acoustic foundation models and encourages more studies using OPERA as an open resource to accelerate research on respiratory audio for health.