Goto

Collaborating Authors

 spectral centroid


Audio-to-Image Encoding for Improved Voice Characteristic Detection Using Deep Convolutional Neural Networks

Atif, Youness

arXiv.org Artificial Intelligence

This paper introduces a novel audio-to-image encoding framework that integrates multiple dimensions of voice characteristics into a single RGB image for speaker recognition. In this method, the green channel encodes raw audio data, the red channel embeds statistical descriptors of the voice signal (including key metrics such as median and mean values for fundamental frequency, spectral centroid, bandwidth, rolloff, zero-crossing rate, MFCCs, RMS energy, spectral flatness, spectral contrast, chroma, and harmonic-to-noise ratio), and the blue channel comprises subframes representing these features in a spatially organized format. A deep convolutional neural network trained on these composite images achieves 98% accuracy in speaker classification across two speakers, suggesting that this integrated multi-channel representation can provide a more discriminative input for voice recognition tasks.

  Genre: Research Report (1.00)
  Industry: Media (0.47)

ENACT-Heart -- ENsemble-based Assessment Using CNN and Transformer on Heart Sounds

Han, Jiho, Shaout, Adnan

arXiv.org Artificial Intelligence

This study explores the application of Vision Transformer (ViT) principles in audio analysis, specifically focusing on heart sounds. This paper introduces ENACT-Heart - a novel ensemble approach that leverages the complementary strengths of Convolutional Neural Networks (CNN) and ViT through a Mixture of Experts (MoE) framework, achieving a remarkable classification accuracy of 97.52%. This outperforms the individual contributions of ViT (93.88%) and CNN (95.45%), demonstrating the potential for enhanced diagnostic accuracy in cardiovascular health monitoring. These results demonstrate the potential of ensemble methods in enhancing classification performance for cardiovascular health monitoring and diagnosis.


Does it Chug? Towards a Data-Driven Understanding of Guitar Tone Description

Sutar, Pratik, Naradowsky, Jason, Miyao, Yusuke

arXiv.org Artificial Intelligence

Natural language is commonly used to describe instrument timbre, such as a "warm" or "heavy" sound. As these descriptors are based on human perception, there can be disagreement over which acoustic features correspond to a given adjective. In this work, we pursue a data-driven approach to further our understanding of such adjectives in the context of guitar tone. Our main contribution is a dataset of timbre adjectives, constructed by processing single clips of instrument audio to produce varied timbres through adjustments in EQ and effects such as distortion. Adjective annotations are obtained for each clip by crowdsourcing experts to complete a pairwise comparison and a labeling task. We examine the dataset and reveal correlations between adjective ratings and highlight instances where the data contradicts prevailing theories on spectral features and timbral adjectives, suggesting a need for a more nuanced, data-driven understanding of timbre.


Clustering of Indonesian and Western Gamelan Orchestras through Machine Learning of Performance Parameters

Linke, Simon, Wendt, Gerrit, Bader, Rolf

arXiv.org Artificial Intelligence

Indonesian and Western gamelan ensembles are investigated with respect to performance differences. Thereby, the often exotistic history of this music in the West might be reflected in contemporary tonal system, articulation, or large-scale form differences. Analyzing recordings of four Western and five Indonesian orchestras with respect to tonal systems and timbre features and using self-organizing Kohonen map (SOM) as a machine learning algorithm, a clear clustering between Indonesian and Western ensembles appears using certain psychoacoustic features. These point to a reduced articulation and large-scale form variability of Western ensembles compared to Indonesian ones. The SOM also clusters the ensembles with respect to their tonal systems, but no clusters between Indonesian and Western ensembles can be found in this respect. Therefore, a clear analogy between lower articulatory variability and large-scale form variation and a more exostistic, mediative and calm performance expectation and reception of gamelan in the West therefore appears.


Learning Sparse Analytic Filters for Piano Transcription

Cwitkowitz, Frank, Heydari, Mojtaba, Duan, Zhiyao

arXiv.org Artificial Intelligence

In recent years, filterbank learning has become an increasingly popular strategy for various audio-related machine learning tasks. This is partly due to its ability to discover task-specific audio characteristics which can be leveraged in downstream processing. It is also a natural extension of the nearly ubiquitous deep learning methods employed to tackle a diverse array of audio applications. In this work, several variations of a frontend filterbank learning module are investigated for piano transcription, a challenging low-level music information retrieval task. We build upon a standard piano transcription model, modifying only the feature extraction stage. The filterbank module is designed such that its complex filters are unconstrained 1D convolutional kernels with long receptive fields. Additional variations employ the Hilbert transform to render the filters intrinsically analytic and apply variational dropout to promote filterbank sparsity. Transcription results are compared across all experiments, and we offer visualization and analysis of the filterbanks.


Identify The Beehive Sound Using Deep Learning

Quaderi, Shah Jafor Sadeek, Labonno, Sadia Afrin, Mostafa, Sadia, Akhter, Shamim

arXiv.org Artificial Intelligence

Flowers play an essential role in removing the duller from the environment. The life cycle of the flowering plants involves pollination, fertilization, flowering, seed-formation, dispersion, and germination. Honeybees pollinate approximately 75% of all flowering plants. Environmental pollution, climate change, natural landscape demolition, and so on, threaten the natural habitats, thus continuously reducing the number of honeybees. As a result, several researchers are attempting to resolve this issue. Applying acoustic classification to recordings of beehive sounds may be a way of detecting changes within them. In this research, we use deep learning techniques, namely Sequential Neural Network, Convolutional Neural Network, and Recurrent Neural Network, on the recorded sounds to classify bee sounds from the nonbeehive noises. In addition, we perform a comparative study among some popular non-deep learning techniques, namely Support Vector Machine, Decision Tree, Random Forest, and Na\"ive Bayes, with the deep learning techniques. The techniques are also verified on the combined recorded sounds (25-75% noises).


Audio Data Analysis Using Deep Learning with Python (Part 1) - KDnuggets

#artificialintelligence

While much of the literature and buzz on deep learning concerns computer vision and natural language processing(NLP), audio analysis -- a field that includes automatic speech recognition(ASR), digital signal processing, and music classification, tagging, and generation -- is a growing subdomain of deep learning applications. Some of the most popular and widespread machine learning systems, virtual assistants Alexa, Siri, and Google Home, are largely products built atop models that can extract information from audio signals. Audio data analysis is about analyzing and understanding audio signals captured by digital devices, with numerous applications in the enterprise, healthcare, productivity, and smart cities. Applications include customer satisfaction analysis from customer support calls, media content analysis and retrieval, medical diagnostic aids and patient monitoring, assistive technologies for people with hearing impairments, and audio analysis for public safety. In the first part of this article series, we will talk about all you need to know before getting started with the audio data analysis and extract necessary features from a sound/audio file. We will also build an Artificial Neural Network(ANN) for the music genre classification.


Music Genre Classification with Python – Towards Data Science

#artificialintelligence

Companies nowadays use music classification, either to be able to place recommendations to their customers (such as Spotify, Soundcloud) or simply as a product (for example Shazam). Determining music genres is the first step in that direction. Machine Learning techniques have proved to be quite successful in extracting trends and patterns from the large pool of data. The same principles are applied in Music Analysis also. In this article, we shall study how to analyse an audio/music signal in Python. We shall then utilise the skills learnt to classify music clips into different genres.