AITopics | binaural audio

Collaborating Authors

binaural audio

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A V-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis Susan Liang 1 Chao Huang

Neural Information Processing SystemsFeb-14-2026, 18:10:05 GMT

This consistency ensures perceptual realism and immersion, enriching the overall user experience.

artificial intelligence, camera pose, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

32cf311edd3cad32dc6672b4f973366e-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 21:46:56 GMT

artificial intelligence, listener, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Surrey (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Israel (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.46)

Add feedback

95f03faf3763e1b1ce2c3de62da8f090-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 21:32:53 GMT

binaural audio, diffusion model, information, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.04)
Asia > China (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.41)

Add feedback

AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis

Neural Information Processing SystemsDec-24-2025, 21:11:50 GMT

Novel view acoustic synthesis (NVAS) aims to render binaural audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing binaural audio. However, in addition to low efficiency originating from heavy NeRF rendering, these methods all have a limited ability of characterizing the entire scene environment such as room geometry, material properties, and the spatial relation between the listener and sound source. To address these issues, we propose a novel Audio-Visual Gaussian Splatting (AV-GS) model. To obtain a material-aware and geometry-aware condition for audio synthesis, we learn an explicit point-based scene representation with audio-guidance parameters on locally initialized Gaussian points, taking into account the space relation from the listener and sound source. To make the visual scene model audio adaptive, we propose a point densification and pruning strategy to optimally distribute the Gaussian points, with the per-point contribution in sound propagation (e.g., more points needed for texture-less wall surfaces as they affect sound path diversion).

artificial intelligence, learning material, synthesis, (7 more...)

Neural Information Processing Systems

Genre: Instructional Material > Course Syllabus & Notes (0.43)

Industry: Education (0.43)

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

Zhang, Mengchen, Chen, Qi, Wu, Tong, Liu, Zihan, Lin, Dahua

arXiv.org Artificial IntelligenceDec-3-2025

Despite progress in video-to-audio generation, the field focuses predominantly on mono output, lacking spatial immersion. Existing binaural approaches remain constrained by a two-stage pipeline that first generates mono audio and then performs spatialization, often resulting in error accumulation and spatio-temporal inconsistencies. To address this limitation, we introduce the task of end-to-end binaural spatial audio generation directly from silent video. To support this task, we present the BiAudio dataset, comprising approximately 97K video-binaural audio pairs spanning diverse real-world scenes and camera rotation trajectories, constructed through a semi-automated pipeline. Furthermore, we propose ViSAudio, an end-to-end framework that employs conditional flow matching with a dual-branch audio generation architecture, where two dedicated branches model the audio latent flows. Integrated with a conditional spacetime module, it balances consistency between channels while preserving distinctive spatial characteristics, ensuring precise spatio-temporal alignment between audio and the input video. Comprehensive experiments demonstrate that ViSAudio outperforms existing state-of-the-art methods across both objective metrics and subjective evaluations, generating high-quality binaural audio with spatial immersion that adapts effectively to viewpoint changes, sound-source motion, and diverse acoustic environments. Project website: https://kszpxxzmc.github.io/ViSAudio-project.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.03036

Genre: Research Report > Promising Solution (0.34)

Industry:

Media (0.94)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

TTMBA: Towards Text To Multiple Sources Binaural Audio Generation

He, Yuxuan, Yang, Xiaoran, Pan, Ningning, Huang, Gongping

arXiv.org Artificial IntelligenceNov-6-2025

Most existing text-to-audio (TT A) generation methods produce mono outputs, neglecting essential spatial information for im-mersive auditory experiences. To address this issue, we propose a cascaded method for text-to-multisource binaural audio generation (TTMBA) with both temporal and spatial control. First, a pretrained large language model (LLM) segments the text into a structured format with time and spatial details for each sound event. Next, a pretrained mono audio generation network creates multiple mono audios with varying durations for each event. These mono audios are transformed into binaural audios using a binaural rendering neural network based on spatial data from the LLM. Finally, the binaural audios are arranged by their start times, resulting in multisource binaural audio. Experimental results demonstrate the superiority of the proposed method in terms of both audio generation quality and spatial perceptual accuracy.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2025-1516

2507.16564

Country: Asia > China > Hubei Province (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

32cf311edd3cad32dc6672b4f973366e-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 22:47:55 GMT

binaural audio, listener, representation, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Surrey (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Israel (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.46)

Add feedback

A V-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis Susan Liang 1 Chao Huang

Neural Information Processing SystemsOct-8-2025, 22:26:24 GMT

This consistency ensures perceptual realism and immersion, enriching the overall user experience.

artificial intelligence, camera pose, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

95f03faf3763e1b1ce2c3de62da8f090-Paper-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 02:59:22 GMT

artificial intelligence, diffusion model, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.04)
Asia > China (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

In-the-wild Audio Spatialization with Flexible Text-guided Localization

Pan, Tianrui, Liu, Jie, Huang, Zewen, Tang, Jie, Wu, Gangshan

arXiv.org Artificial IntelligenceJun-3-2025

To enhance immersive experiences, binaural audio offers spatial awareness of sounding objects in AR, VR, and embodied AI applications. While existing audio spatialization methods can generally map any available monaural audio to binaural audio signals, they often lack the flexible and interactive control needed in complex multi-object user-interactive environments. To address this, we propose a Text-guided Audio Spatialization (TAS) framework that utilizes flexible text prompts and evaluates our model from unified generation and comprehension perspectives. Due to the limited availability of premium and large-scale stereo data, we construct the SpatialTAS dataset, which encompasses 376,000 simulated binaural audio samples to facilitate the training of our model. Our model learns binaural differences guided by 3D spatial location and relative position prompts, augmented by flipped-channel audio. It outperforms existing methods on both simulated and real-recorded datasets, demonstrating superior generalization and accuracy. Besides, we develop an assessment model based on Llama-3.1-8B, which evaluates the spatial semantic coherence between our generated binaural audio and text prompts through a spatial reasoning task. Results demonstrate that text prompts provide flexible and interactive control to generate binaural audio with excellent quality and semantic consistency in spatial locations. Dataset is available at \href{https://github.com/Alice01010101/TASU}

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.00927

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback