speech separation
- Europe > Italy > Tuscany > Florence (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (2 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.93)
- North America > United States > Illinois (0.04)
- Asia > India (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > Canada (0.04)
- North America > United States (0.04)
- Europe > Germany > Hamburg (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Speech (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)
- North America > United States (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > Canada (0.04)
- (3 more...)
UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures
In reverberant conditions with multiple concurrent speakers, each microphone acquires a mixture signal of multiple speakers at a different location. In over-determined conditions where the microphones out-number speakers, we can narrow down the solutions to speaker images and realize unsupervised speech separation by leveraging each mixture signal as a constraint (i.e., the estimated speaker images at a microphone should add up to the mixture).
Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network
Recent advances in the design of neural network architectures, in particular those specialized in modeling sequences, have provided significant improvements in speech separation performance. In this work, we propose to use a bio-inspired architecture called Fully Recurrent Convolutional Neural Network (FRCNN) to solve the separation task. This model contains bottom-up, top-down and lateral connections to fuse information processed at various time-scales represented by stages. In contrast to the traditional approach updating stages in parallel, we propose to first update the stages one by one in the bottom-up direction, then fuse information from adjacent stages simultaneously and finally fuse information from all stages to the bottom stage together. Experiments showed that this asynchronous updating scheme achieved significantly better results with much fewer parameters than the traditional synchronous updating scheme on speech separation. In addition, the proposed model achieved competitive or better results with high efficiency as compared to other state-of-the-art approaches on two benchmark datasets.
Speech Separation for Hearing-Impaired Children in the Classroom
Olalere, Feyisayo, van der Heijden, Kiki, Stronks, H. Christiaan, Briaire, Jeroen, Frijns, Johan H. M., Güçlütürk, Yagmur
The process includes simulating room and listener acoustic properties (A), modeling talkers' movement trajectories (B), and synthesizing classroom speech mixtures (C). The numbers (1) - (5) correspond to the steps itemized in section II-B more challenging and reflective of classroom acoustics. The separation model is trained to output time-domain waveforms for each speaker with no interference from the other speaker or background noise. This setup enables the model to not only separate overlapping speech, but also to preserve spatial distinctions associated with each moving source. B. Simulation of Overlapping Speech for Classroom Conditions To capture the reverberant and spatial characteristics typical of classroom environments, we developed a spatialization pipeline for generating training and evaluation data (see Fig.1). This pipeline consists of five main components, which are explained below in detail: 1) Simulation of room impulse responses (RIRs) 2) Application of head-related impulse responses (HRIRs) 3) Generation of binaural room impulse responses (BRIRs) 4) Modeling of talkers' movement trajectories 5) Synthesis of the classroom speech data 1) Room Impulse Responses: To simulate naturalistic reverberant classroom acoustics, we generated RIRs that capture direct sound, early reflections, and reverberation or echo. These RIRs were used to spatialize source signals in simulated classroom environments with varying geometry, reverberation, and source-listener distances. We used the Pyroomacoustics Python package [35], which implements the image source method to model sound propagation in rectangular (shoebox) rooms. A total of 30 classrooms were simulated, with dimensions randomly sampled from a range of 8.5 8.5 3 m to 10 10 3.5 m (length width height), reflecting typical U.S. classroom sizes [36], [37].
- Europe > Netherlands > South Holland > Leiden (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- North America > United States (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Education > Educational Setting (1.00)
- Health & Medicine > Therapeutic Area > Otolaryngology (0.47)
UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement
Yan, Haoyin, Liu, Chengwei, Xue, Shaofei, Liang, Xiaotao, Xue, Zheng
The development of neural audio codecs (NACs) has largely promoted applications of language models (LMs) to speech processing and understanding. However, there lacks the verification on the effectiveness of autoregressive (AR) LMbased models in unifying different sub-tasks of speech enhancement (SE). In this work, we propose UniSE, a unified decoder-only LM-based framework to handle different SE tasks including speech restoration, target speaker extraction and speech separation. It takes input speech features as conditions and generates discrete tokens of the target speech using AR modeling, which facilitates a compatibility between distinct learning patterns of multiple tasks. Experiments on several benchmarks indicate the proposed UniSE can achieve competitive performance compared to discriminative and generative baselines, showing the capacity of LMs in unifying SE tasks. The demo page is available here: https://github.com/hyyan2k/UniSE.