AV-Cloud: Spatial Audio Rendering Through Audio-Visual Cloud Splatting

May-25-2025, 21:41:08 GMT–Neural Information Processing Systems

We propose a novel approach for rendering high-quality spatial audio for 3D scenes that is in synchrony with the visual stream but does not rely or explicitly conditioned on the visual rendering. We demonstrate that such an approach enables the experience of immersive virtual tourism - performing a real-time dynamic navigation within the scene, experiencing both audio and visual content. Current audio-visual rendering approaches typically rely on visual cues, such as images, and thus visual artifacts could cause inconsistency in the audio quality. Furthermore, when such approaches are incorporated with visual rendering, audio generation at each viewpoint occurs after the rendering of the image of the viewpoint and thus could lead to audio lag that affects the integration of audio and visual streams. Our proposed approach, AV-Cloud, overcomes these challenges by learning the representation of the audio-visual scene based on a set of sparse AV anchor points, that constitute the Audio-Visual Cloud, and are derived from the camera calibration.

artificial intelligence, machine learning, rendering, (17 more...)

Neural Information Processing Systems

May-25-2025, 21:41:08 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.46)

Genre:
- Research Report > Experimental Study (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.46)
    - Vision (1.00)
  - Graphics (0.93)