Collaborating Authors

Cross-Representation Transferability of Adversarial Perturbations: From Spectrograms to Audio Waveforms Machine Learning

This paper shows the susceptibility of spectrogram-based audio classifiers to adversarial attacks and the transferability of such attacks to audio waveforms. Some commonly adversarial attacks to images have been applied to Mel-frequency and short-time Fourier transform spectrograms and such perturbed spectrograms are able to fool a 2D convolutional neural network (CNN) for music genre classification with a high fooling rate and high confidence. Such attacks produce perturbed spectrograms that are visually imperceptible by humans. Experimental results on a dataset of western music have shown that the 2D CNN achieves up to 81.87% of mean accuracy on legitimate examples and such a performance drops to 12.09% on adversarial examples. Furthermore, the audio signals reconstructed from the adversarial spectrograms produce audio waveforms that perceptually resemble the legitimate audio.

Data augmentation approaches for improving animal audio classification Machine Learning

In this paper we present ensembles of classifiers for automated animal audio classification, exploiting different data augmentation techniques for training Convolutional Neural Networks (CNNs). The specific animal audio classification problems are i) birds and ii) cat sounds, whose datasets are freely available. We train five different CNNs on the original datasets and on their versions augmented by four augmentation protocols, working on the raw audio signals or their representations as spectrograms. We compared our best approaches with the state of the art, showing that we obtain the best recognition rate on the same datasets, without ad hoc parameter optimization. Our study shows that different CNNs can be trained for the purpose of animal audio classification and that their fusion works better than the stand-alone classifiers. To the best of our knowledge this is the largest study on data augmentation for CNNs in animal audio classification audio datasets using the same set of classifiers and parameters. Our MATLAB code is available at .

A Deep Bag-of-Features Model for Music Auto-Tagging Machine Learning

Feature learning and deep learning have drawn great attention in recent years as a way of transforming input data into more effective representations using learning algorithms. Such interest has grown in the area of music information retrieval (MIR) as well, particularly in music audio classification tasks such as auto-tagging. In this paper, we present a two-stage learning model to effectively predict multiple labels from music audio. The first stage learns to project local spectral patterns of an audio track onto a high-dimensional sparse space in an unsupervised manner and summarizes the audio track as a bag-of-features. The second stage successively performs the unsupervised learning on the bag-of-features in a layer-by-layer manner to initialize a deep neural network and finally fine-tunes it with the tag labels. Through the experiment, we rigorously examine training choices and tuning parameters, and show that the model achieves high performance on Magnatagatune, a popularly used dataset in music auto-tagging.

Zero-Shot Audio Classification Based on Class Label Embeddings Machine Learning

This paper proposes a zero-shot learning approach for audio classification based on the textual information about class labels without any audio samples from target classes. We propose an audio classification system built on the bilinear model, which takes audio feature embeddings and semantic class label embeddings as input, and measures the compatibility between an audio feature embedding and a class label embedding. We use VGGish to extract audio feature embeddings from audio recordings. We treat textual labels as semantic side information of audio classes, and use Word2Vec to generate class label embeddings. Results on the ESC-50 dataset show that the proposed system can perform zero-shot audio classification with small training dataset. It can achieve accuracy (26 % on average) better than random guess (10 %) on each audio category. Particularly, it reaches up to 39.7 % for the category of natural audio classes.

LG Explains V30 Premium Audio Features Ahead Of Note 8 Rival Launch

International Business Times

LG is gearing up for the launch of its next installment in the V series called the V30. Ahead of the new phablet's official unveiling this week, the South Korean tech giant decided to discuss in detail the premium audio features of the device. This Monday, LG took to its official online Newsroom to introduce in detail the audio functions that the V30 will be offering to consumers, especially audiophiles, when it arrives. Detailing the premium audio features of its new handset, LG touched on why the V30 is the perfect phone to provide a personalized sound experience. First off, LG confirmed that the V20's successor will indeed come equipped with Hi-Fi DAC (Digital to Analog Converter) for outstanding sound quality.