AITopics | acoustic

Collaborating Authors

acoustic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

Yamauchi, Shogo, Tamori, Hideaki, Sakai, Makoto, Yamano, Yosuke, Nitta, Tohru

arXiv.org Machine LearningJun-19-2026

We propose a parameter-efficient speech enhancement framework, Quaternion Conformer GAN (QC-GAN), which combines a Quaternion Conformer generator with MetricGAN-based training. The Hamilton product encodes the magnitude and phase via structured weight sharing, reducing the number of layer parameters while preserving their interdependencies. A metric-learning discriminator was employed to maximize perceptual quality by optimizing the approximate perceptual evaluation scores. On the VoiceBank+DEMAND dataset, QC-GAN achieved a Perceptual Evaluation of Speech Quality (PESQ) score of 3.48 with only 0.89M parameters, delivering a performance comparable to state-of-the-art models at less than half their size. A 35K-parameter variant achieved a PESQ score of 3.23, surpassing conventional methods with significantly fewer parameters. Evaluation on the DNS-Challenge 3 dataset further confirmed generalization to real-world conditions.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2606.18611

Country: Asia (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction

Neural Information Processing SystemsJun-18-2026, 22:22:46 GMT

Unlike prior approaches constrained to fixed instrument classes, MGE-LDM learns a joint distribution over full mixtures, submixtures, and individual stems within a single compact latent diffusion model. At inference, MGE-LDM enables (1) complete mixture generation, (2) partial generation (i.e., source imputation), and (3) textconditioned extraction of arbitrary sources. By formulating both separation and imputation as conditional inpainting tasks in the latent space, our approach supports flexible, class-agnostic manipulation of arbitrary instrument sources. Notably, MGE-LDM can be trained jointly across heterogeneous multi-track datasets (e.g., Slakh2100, MUSDB18, MoisesDB) without relying on predefined instrument categories. Audio samples are available at our project page .

artificial intelligence, international conference, machine learning, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Overview (0.92)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Neural Information Processing SystemsApr-26-2026, 21:07:23 GMT

Recent advances in the design of neural network architectures, in particular those specialized in modeling sequences, have provided significant improvements in speech separation performance. In this work, we propose to use a bio-inspired architecture called Fully Recurrent Convolutional Neural Network (FRCNN) to solve the separation task. This model contains bottom-up, top-down and lateral connections to fuse information processed at various time-scales represented by stages. In contrast to the traditional approach updating stages in parallel, we propose to first update the stages one by one in the bottom-up direction, then fuse information from adjacent stages simultaneously and finally fuse information from all stages to the bottom stage together. Experiments showed that this asynchronous updating scheme achieved significantly better results with much fewer parameters than the traditional synchronous updating scheme. In addition, the proposed model achieved good balance between speech separation accuracy and computational efficiency as compared to other state-of-the-art models on three benchmark datasets.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia (0.46)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

9d276b0a087efdd2404f3295b26c24c1-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 03:43:13 GMT

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Massachusetts (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.72)

Add feedback

Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation

Neural Information Processing SystemsFeb-14-2026, 17:47:47 GMT

In speech separation, time-domain approaches have successfully replaced the time-frequency domain with latent sequence feature from a learnable encoder.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: