AITopics | audio network

Collaborating Authors

audio network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SupplementaryMaterial: LearningRepresentations fromAudio-VisualSpatialAlignment

Neural Information Processing SystemsFeb-8-2026, 00:42:58 GMT

These are transformer networks of base dimension 512 and expansion ration 4. In other words,7 the output dimensionality of the linear transformations of parametersWkey,Wqr,Wval,W0 and8 W2 are 512, and that ofW1 is 2048. Models are pre-trained to optimize loss (7) for AVC task or9 (9)forAVTSandAVSAtasks. Asoriginallyproposed,15 lateral connections are implemented with a1 1 convolution that maps all feature maps into a16 128 dimensional space followed by a3 3convolution for increased smoothing. Thus, all pixels for which the state-of-the-art model was less25 than 75% confident were kept unlabeled. These low confidence regions were also ignored while26 computingevaluationmetrics.

artificial intelligence, supplementarymaterial, viewpoint, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)

Add feedback

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

Neural Information Processing SystemsDec-25-2025, 13:46:53 GMT

This paper tackles post-hoc interpretability for audio processing networks. Our goal is to interpret decisions of a trained network in terms of high-level audio objects that are also listenable for the end-user. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, a regularized interpreter module is trained to take hidden layer representations of the targeted network as input and produce time activations of pre-learnt NMF components as intermediate outputs. Our methodology allows us to generate intuitive audio-based interpretations that explicitly enhance parts of the input signal most relevant for a network's decision. We demonstrate our method's applicability on popular benchmarks, including a real-world multi-label classification task.

audio network, name change, post-hoc interpretability, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

Neural Information Processing SystemsJan-19-2025, 04:01:51 GMT

audio network, nmf, post-hoc interpretability, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization

Berghi, Davide, Jackson, Philip J. B.

arXiv.org Artificial IntelligenceDec-21-2023

Conventional audio-visual approaches for active speaker detection (ASD) typically rely on visually pre-extracted face tracks and the corresponding single-channel audio to find the speaker in a video. Therefore, they tend to fail every time the face of the speaker is not visible. We demonstrate that a simple audio convolutional recurrent neural network (CRNN) trained with spatial input features extracted from multichannel audio can perform simultaneous horizontal active speaker detection and localization (ASDL), independently of the visual modality. To address the time and cost of generating ground truth labels to train such a system, we propose a new self-supervised training pipeline that embraces a ``student-teacher'' learning approach. A conventional pre-trained active speaker detector is adopted as a ``teacher'' network to provide the position of the speakers as pseudo-labels. The multichannel audio ``student'' network is trained to generate the same results. At inference, the student network can generalize and locate also the occluded speakers that the teacher network is not able to detect visually, yielding considerable improvements in recall rate. Experiments on the TragicTalkers dataset show that an audio network trained with the proposed self-supervised learning approach can exceed the performance of the typical audio-visual methods and produce results competitive with the costly conventional supervised training. We demonstrate that improvements can be achieved when minimal manual supervision is introduced in the learning pipeline. Further gains may be sought with larger training sets and integrating vision with the multichannel audio system.

detection, supervision, teacher network, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TASLP.2023.3346643

2312.14021

Country:

Europe > United Kingdom > England > Surrey > Guildford (0.04)
Europe > United Kingdom > England > Hampshire > Southampton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech

Lee, Jiyoung, Chung, Joon Son, Chung, Soo-Whan

arXiv.org Artificial IntelligenceFeb-27-2023

The goal of this work is zero-shot text-to-speech synthesis, with speaking styles and voices learnt from facial characteristics. Inspired by the natural fact that people can imagine the voice of someone when they look at his or her face, we introduce a face-styled diffusion text-to-speech (TTS) model within a unified framework learnt from visible attributes, called Face-TTS. This is the first time that face images are used as a condition to train a TTS model. We jointly train cross-model biometrics and TTS models to preserve speaker identity between face images and generated speech segments. We also propose a speaker feature binding loss to enforce the similarity of the generated and the ground truth speech segments in speaker embedding space. Since the biometric information is extracted directly from the face image, our method does not require extra fine-tuning steps to generate speech from unseen and unheard speakers. We train and evaluate the model on the LRS3 dataset, an in-the-wild audio-visual corpus containing background noise and diverse speaking styles. The project page is https://facetts.github.io.

artificial intelligence, face image, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2302.137

Country:

Asia > South Korea (0.05)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.83)

Add feedback

Is Artificial Intelligence about to transform the sync industry? - Music Business Worldwide

#artificialintelligenceSep-5-2019, 13:34:33 GMT

There's been plenty of discussion and debate on MBW's pages regarding the impact that Artificial Intelligence might have on the music business in the future. Obviously, there's its potentially seismic effect on the way musicians make music – whether that's AI producing non-human music from scratch, or providing tools that artists and songwriters can use to compose and perform in the studio. But there's also AI's application to more practical B2B tools to consider. Just last week, for example, we heard from Canada-based LANDR, which has launched an AI tool that helpfully sifts through its huge catalog of samples for those looking for a specific sound. Today, (September 4), a new twist on AI arrives via a fresh partnership between production music library Audio Network and Singapore-based machine learning company, Musiio.

artificial intelligence, audio network, machine learning, (11 more...)

#artificialintelligence

Country:

Asia > Singapore (0.28)
North America > Canada (0.26)

Industry: Media > Music (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

Audio Network Partners with Musiio to Harness the Power of Artificial Intelligence (AI)

#artificialintelligenceSep-5-2019, 04:52:29 GMT

Audio Network Limited, one of the world's largest independent creators and publishers of original high-quality music for use in film, television, advertising and digital media, continues its focus on technology by partnering with Musiio to explore the power of AI to improve customer service and delivery. This industry first will equip the global music company with an added interface to their existing search platform, to make their catalogue of over 170,000 tracks even more discoverable, whilst keeping the human touch that Audio Network has always been known for. Singapore-based Musiio provides a new way of "listening" to music at scale, easily searching up to one million tracks in under two seconds and supercharging a team of music researchers to increase their efficiency in responding to music briefs. "AI has been on the fringes of the music industry for the last few years, with talk of labels signing algorithms. But recently, more commercial and practical uses of this powerful computing technology have begun to surface," explained Musiio CEO and co-founder Hazel Savage.

artificial intelligence, audio network, music, (14 more...)

#artificialintelligence

Country: Asia > Singapore (0.27)

Genre: Press Release (0.34)

Industry: Media > Music (0.63)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Filters

Collaborating Authors

audio network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

SupplementaryMaterial: LearningRepresentations fromAudio-VisualSpatialAlignment

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

328e5d4c166bb340b314d457a208dc83-Supplemental.pdf

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech

Is Artificial Intelligence about to transform the sync industry? - Music Business Worldwide

Audio Network Partners with Musiio to Harness the Power of Artificial Intelligence (AI)