AITopics | Marinoni, Christian

Collaborating Authors

Marinoni, Christian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Stable-V2A: Synthesis of Synchronized Sound Effects with Temporal and Semantic Controls

Gramaccioni, Riccardo Fosco, Marinoni, Christian, Postolache, Emilian, Comunità, Marco, Cosmo, Luca, Reiss, Joshua D., Comminiello, Danilo

arXiv.org Artificial IntelligenceJan-2-2025

Sound designers and Foley artists usually sonorize a scene, such as from a movie or video game, by manually annotating and sonorizing each action of interest in the video. In our case, the intent is to leave full creative control to sound designers with a tool that allows them to bypass the more repetitive parts of their work, thus being able to focus on the creative aspects of sound production. We achieve this presenting Stable-V2A, a two-stage model consisting of: an RMS-Mapper that estimates an envelope representative of the audio characteristics associated with the input video; and Stable-Foley, a diffusion model based on Stable Audio Open that generates audio semantically and temporally aligned with the target video. Temporal alignment is guaranteed by the use of the envelope as a ControlNet input, while semantic alignment is achieved through the use of sound representations chosen by the designer as cross-attention conditioning of the diffusion process. We train and test our model on Greatest Hits, a dataset commonly used to evaluate V2A models. In addition, to test our model on a case study of interest, we introduce Walking The Maps, a dataset of videos extracted from video games depicting animated characters walking in different locations. Samples and code available on our demo page at https://ispamm.github.io/Stable-V2A.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.15023

Country: Europe > Italy (0.46)

Genre: Research Report (0.64)

Industry:

Media > Film (0.66)
Leisure & Entertainment > Games > Computer Games (0.54)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Overview of the L3DAS23 Challenge on Audio-Visual Extended Reality

Marinoni, Christian, Gramaccioni, Riccardo Fosco, Chen, Changan, Uncini, Aurelio, Comminiello, Danilo

arXiv.org Artificial IntelligenceFeb-14-2024

The primary goal of the L3DAS23 Signal Processing Grand Challenge at ICASSP 2023 is to promote and support collaborative research on machine learning for 3D audio signal processing, with a specific emphasis on 3D speech enhancement and 3D Sound Event Localization and Detection in Extended Reality applications. As part of our latest competition, we provide a brand-new dataset, which maintains the same general characteristics of the L3DAS21 and L3DAS22 datasets, but with first-order Ambisonics recordings from multiple reverberant simulated environments. Moreover, we start exploring an audio-visual scenario by providing images of these environments, as perceived by the different microphone positions and orientations. We also propose updated baseline models for both tasks that can now support audio-image couples as input and a supporting API to replicate our results. Finally, we present the results of the participants. Further details about the challenge are available at https://www.l3das.com/icassp2023.

artificial intelligence, machine learning, microphone, (17 more...)

arXiv.org Artificial Intelligence

2402.09245

Country: North America > United States (0.29)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.90)

Add feedback

L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office Environment

Guizzo, Eric, Marinoni, Christian, Pennese, Marco, Ren, Xinlei, Zheng, Xiguang, Zhang, Chen, Masiero, Bruno, Uncini, Aurelio, Comminiello, Danilo

arXiv.org Artificial IntelligenceFeb-21-2022

The L3DAS22 Challenge is aimed at encouraging the development of machine learning strategies for 3D speech enhancement and 3D sound localization and detection in office-like environments. This challenge improves and extends the tasks of the L3DAS21 edition. We generated a new dataset, which maintains the same general characteristics of L3DAS21 datasets, but with an extended number of data points and adding constrains that improve the baseline model's efficiency and overcome the major difficulties encountered by the participants of the previous challenge. We updated the baseline model of Task 1, using the architecture that ranked first in the previous challenge edition. We wrote a new supporting API, improving its clarity and ease-of-use. In the end, we present and discuss the results submitted by all participants. L3DAS22 Challenge website: www.l3das.com/icassp2022.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP43922.2022.9746872

2202.10372

Country:

North America > United States (0.28)
South America > Brazil (0.28)
Europe > Italy (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

L3DAS21 Challenge: Machine Learning for 3D Audio Signal Processing

Guizzo, Eric, Gramaccioni, Riccardo F., Jamili, Saeid, Marinoni, Christian, Massaro, Edoardo, Medaglia, Claudia, Nachira, Giuseppe, Nucciarelli, Leonardo, Paglialunga, Ludovica, Pennese, Marco, Pepe, Sveva, Rocchi, Enrico, Uncini, Aurelio, Comminiello, Danilo

arXiv.org Artificial IntelligenceApr-29-2021

The L3DAS21 Challenge is aimed at encouraging and fostering collaborative research on machine learning for 3D audio signal processing, with particular focus on 3D speech enhancement (SE) and 3D sound localization and detection (SELD). Alongside with the challenge, we release the L3DAS21 dataset, a 65 hours 3D audio corpus, accompanied with a Python API that facilitates the data usage and results submission stage. Usually, machine learning approaches to 3D audio tasks are based on single-perspective Ambisonics recordings or on arrays of single-capsule microphones. We propose, instead, a novel multichannel audio configuration based multiple-source and multiple-perspective Ambisonics recordings, performed with an array of two first-order Ambisonics microphones. To the best of our knowledge, it is the first time that a dual-mic Ambisonics configuration is used for these tasks. We provide baseline models and results for both tasks, obtained with state-of-the-art architectures: FaSNet for SE and SELDNet for SELD. This report is aimed at providing all needed information to participate in the L3DAS21 Challenge, illustrating the details of the L3DAS21 dataset, the challenge tasks and the baseline models.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/MLSP52302.2021.9596248

2104.05499

Country:

North America > United States (0.47)
Europe > Italy (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback