AITopics | audioclip

Collaborating Authors

audioclip

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Visual Acoustic Fields

Li, Yuelei, Kim, Hyunjin, Zhan, Fangneng, Qiu, Ri-Zhao, Ji, Mazeyu, Shan, Xiaojun, Zou, Xueyan, Liang, Paul, Pfister, Hanspeter, Wang, Xiaolong

arXiv.org Artificial IntelligenceMar-31-2025

Objects produce different sounds when hit, and humans can intuitively infer how an object might sound based on its appearance and material properties. Inspired by this intuition, we propose Visual Acoustic Fields, a framework that bridges hitting sounds and visual signals within a 3D space using 3D Gaussian Splatting (3DGS). Our approach features two key modules: sound generation and sound localization. The sound generation module leverages a conditional diffusion model, which takes multiscale features rendered from a feature-augmented 3DGS to generate realistic hitting sounds. Meanwhile, the sound localization module enables querying the 3D scene, represented by the feature-augmented 3DGS, to localize hitting positions based on the sound sources. To support this framework, we introduce a novel pipeline for collecting scene-level visual-sound sample pairs, achieving alignment between captured images, impact locations, and corresponding sounds. To the best of our knowledge, this is the first dataset to connect visual and acoustic signals in a 3D context. Extensive experiments on our dataset demonstrate the effectiveness of Visual Acoustic Fields in generating plausible impact sounds and accurately localizing impact sources. Our project page is at https://yuelei0428.github.io/projects/Visual-Acoustic-Fields/.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.2427

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Meta's Movie Gen Makes Convincing AI Video Clips

WIREDOct-4-2024, 13:00:00 GMT

Meta just announced its own media-focused AI model, called Movie Gen, that can be used to generate realistic video and audioclips. The company shared multiple 10-second clips generated with Movie Gen, including a Moo Deng-esque baby hippo swimming around, to demonstrate its capabilities. While the tool is not yet available for use, this Movie Gen announcement comes shortly after its Meta Connect event, which showcased new and refreshed hardware and the latest version of its large language model, Llama 3.2. Going beyond the generation of straightforward text-to-video clips, the Movie Gen model can make targeted edits to an existing clip, like adding an object into someone's hands or changing the appearance of a surface. In one of the example videos from Meta, a woman wearing a VR headset was transformed to look like she was wearing steampunk binoculars.

large language model, machine learning, natural language, (11 more...)

WIRED

Industry:

Leisure & Entertainment (0.94)
Media > Film (0.73)
Information Technology > Services (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Noah Raford Can Help You Prepare for a Not-So-Nice Future

WIREDApr-26-2023, 11:00:00 GMT

Lauren Goode: Alright, I'm gonna ask the question that everyone's wondering about: What is a futurist? Gideon Lichfield: Well, I mean, I think some people imagine it's just, you know, a guy who sits around making predictions about the future, and there are probably some people who do just that. But Noah calls himself an applied futurist by which he means that he studies trends--technological, economic, demographic, political, you name it. And then he works within institutions like the government to help them take those trends into account in their decision-making and their policies. So how should they think about the impact of AI, for instance?

audioclip, gideon lichfield, noah raford, (5 more...)

WIRED

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Mobile (0.40)

Add feedback

Improving automated segmentation of radio shows with audio embeddings

Berlage, Oberon, Lux, Klaus-Michael, Graus, David

arXiv.org Machine LearningFeb-12-2020

Audio features have been proven useful for increasing the performance of automated topic segmentation systems. This study explores the novel task of using audio embeddings for automated, topically coherent segmentation of radio shows. We created three different audio embedding generators using multi-class classification tasks on three datasets from different domains. We evaluate topic segmentation performance of the audio embeddings and compare it against a text-only baseline. We find that a set-up including audio embeddings generated through a non-speech sound event classification task significantly outperforms our text-only baseline by 32.3% in F1-measure. In addition, we find that different classification tasks yield audio embeddings that vary in segmentation performance.

classification task, generator, segmentation, (17 more...)

arXiv.org Machine Learning

2002.05194

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry: Media > Radio (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback