AITopics | Fonseca, Eduardo

Collaborating Authors

Fonseca, Eduardo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dataset balancing can hurt model performance

Moore, R. Channing, Ellis, Daniel P. W., Fonseca, Eduardo, Hershey, Shawn, Jansen, Aren, Plakal, Manoj

arXiv.org Artificial IntelligenceJun-30-2023

Machine learning from training data with a skewed distribution of examples per class can lead to models that favor performance on common classes at the expense of performance on rare ones. AudioSet has a very wide range of priors over its 527 sound event classes. Classification performance on AudioSet is usually evaluated by a simple average over per-class metrics, meaning that performance on rare classes is equal in importance to the performance on common ones. Several recent papers have used dataset balancing techniques to improve performance on AudioSet. We find, however, that while balancing improves performance on the public AudioSet evaluation data it simultaneously hurts performance on an unpublished evaluation set collected under the same conditions. By varying the degree of balancing, we show that its benefits are fragile and depend on the evaluation set. We also do not find evidence indicating that balancing improves rare class performance relative to common classes. We therefore caution against blind application of balancing, as well as against paying too much attention to small improvements on a public evaluation set.

artificial intelligence, evaluation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP49357.2023.10095255

2307.00079

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

FSD50K: an Open Dataset of Human-Labeled Sound Events

Fonseca, Eduardo, Favory, Xavier, Pons, Jordi, Font, Frederic, Serra, Xavier

arXiv.org Machine LearningOct-1-2020

Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on a massive amount of audio tracks from YouTube videos and encompassing over 500 classes of everyday sounds. However, AudioSet is not an open dataset---its release consists of pre-computed audio features (instead of waveforms), which limits the adoption of some SER methods. Downloading the original audio tracks is also problematic due to constituent YouTube videos gradually disappearing and usage rights issues, which casts doubts over the suitability of this resource for systems' benchmarking. To provide an alternative benchmark dataset and thus foster SER research, we introduce FSD50K, an open dataset containing over 51k audio clips totalling over 100h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. The audio clips are licensed under Creative Commons licenses, making the dataset freely distributable (including waveforms). We provide a detailed description of the FSD50K creation process, tailored to the particularities of Freesound data, including challenges encountered and solutions adopted. We include a comprehensive dataset characterization along with discussion of limitations and key factors to allow its audio-informed usage. Finally, we conduct sound event classification experiments to provide baseline systems as well as insight on the main factors to consider when splitting Freesound audio data for SER. Our goal is to develop a dataset to be widely adopted by the community as a new open benchmark for SER research.

dataset, deep learning, neural network, (23 more...)

arXiv.org Machine Learning

2010.00475

Country:

North America > United States > New York (0.14)
Europe > Middle East > Cyprus (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.45)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.67)
Information Technology (0.65)
Transportation > Ground (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Model-agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers

Fonseca, Eduardo, Font, Frederic, Serra, Xavier

arXiv.org Machine LearningOct-26-2019

ABSTRACT Label noise is emerging as a pressing issue in sound event classification. This arises as we move towards larger datasets that are difficult to annotate manually, but it is even more severe if datasets are collected automatically from online repositories, where labels are inferred through automated heuristics applied to the audio content or metadata. While learning from noisy labels has been an active area of research in computer vision, it has received little attention in sound event classification. Most recent computer vision approaches against label noise are relatively complex, requiring complex networks or extra data resources. In this work, we evaluate simple and efficient model-agnostic approaches to handling noisy labels when training sound event classifiers, namely label smoothing regularization, mixup and noise-robust loss functions. The main advantage of these methods is that they can be easily incorporated to existing deep learning pipelines without need for network modifications or extra resources. We report results from experiments conducted with the FSDnoisy18k dataset. We show that these simple methods can be effective in mitigating the effect of label noise, providing up to 2.5% of accuracy boost when incorporated to two different CNNs, while requiring minimal intervention and computational overhead.

deep learning, label noise, neural network, (19 more...)

arXiv.org Machine Learning

1910.12004

Country: Europe (0.28)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

A hybrid parametric-deep learning approach for sound event localization and detection

Perez-Lopez, Andres, Fonseca, Eduardo, Serra, Xavier

arXiv.org Machine LearningAug-27-2019

This work describes and discusses an algorithm submitted to the Sound Event Localization and Detection Task of DCASE2019 Challenge. The proposed methodology relies on parametric spatial audio analysis for source localization and detection, combined with a deep learning-based monophonic event classifier. The evaluation of the proposed algorithm yields overall results comparable to the baseline system. The main highlight is a reduction of the localization error on the evaluation dataset by a factor of 2.6, compared with the baseline performance.

deep learning, estimation, neural network, (15 more...)

arXiv.org Machine Learning

1908.10133

Country: Europe > Middle East > Cyprus (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Audio tagging with noisy labels and minimal supervision

Fonseca, Eduardo, Plakal, Manoj, Font, Frederic, Ellis, Daniel P. W., Serra, Xavier

arXiv.org Machine LearningJun-7-2019

This paper introduces Task 2 of the DCASE2019 Challenge, titled "Audio tagging with noisy labels and minimal supervision". This task was hosted on the Kaggle platform as "Freesound Audio Tagging 2019". The task evaluates systems for multi-label audio tagging using a large set of noisy-labeled data, and a much smaller set of manually-labeled data, under a large vocabulary setting of 80 everyday sound classes. In addition, the proposed dataset poses an acoustic mismatch problem between the noisy train set and the test set due to the fact that they come from different web audio sources. This can correspond to a realistic scenario given by the difficulty of gathering large amounts of manually labeled data. We present the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network. All these resources are freely available.

dataset, deep learning, neural network, (16 more...)

arXiv.org Machine Learning

1906.02975

Country: North America > United States (0.30)

Genre: Instructional Material > Course Syllabus & Notes (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Learning Sound Event Classifiers from Web Audio with Noisy Labels

Fonseca, Eduardo, Plakal, Manoj, Ellis, Daniel P. W., Font, Frederic, Favory, Xavier, Serra, Xavier

arXiv.org Machine LearningJan-4-2019

ABSTRACT As sound event classification moves towards larger datasets, issues of label noise become inevitable. Web sites can supply large volumes ofuser-contributed audio and metadata, but inferring labels from this metadata introduces errors due to unreliable inputs, and limitations in the mapping. There is, however, little research into the impact of these errors. To foster the investigation of label noise in sound event classification we present FSDnoisy18k, a dataset containing 42.5hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of realworld noisydata. We characterize the label noise empirically, and provide a CNN baseline system. Experiments suggest that training withlarge amounts of noisy data can outperform training with smaller amounts of carefully-labeled data. We also show that noiserobust lossfunctions can be effective in improving performance in presence of corrupted labels.

deep learning, label noise, neural network, (21 more...)

arXiv.org Machine Learning

1901.01189

Country:

North America > United States (0.28)
Europe > Middle East > Cyprus (0.14)

Genre: Research Report > Experimental Study (0.34)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

Fonseca, Eduardo, Plakal, Manoj, Font, Frederic, Ellis, Daniel P. W., Favory, Xavier, Pons, Jordi, Serra, Xavier

arXiv.org Machine LearningJul-27-2018

This paper describes Task 2 of the DCASE 2018 Challenge, titled "General-purpose audio tagging of Freesound content with AudioSet labels". This task was hosted on the Kaggle platform as "Freesound General-Purpose Audio Tagging Challenge". The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 heterogeneous categories drawn from the AudioSet Ontology. We present the task, the dataset prepared for the competition, and a baseline system.

artificial intelligence, category, neural network, (20 more...)

arXiv.org Machine Learning

1807.09902

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (0.95)
Media > Music (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

A Simple Fusion of Deep and Shallow Learning for Acoustic Scene Classification

Fonseca, Eduardo, Gong, Rong, Serra, Xavier

arXiv.org Machine LearningJun-27-2018

In the past, Acoustic Scene Classification systems have been based on hand crafting audio features that are input to a classifier. Nowadays, the common trend is to adopt data driven techniques, e.g., deep learning, where audio representations are learned from data. In this paper, we propose a system that consists of a simple fusion of two methods of the aforementioned types: a deep learning approach where log-scaled mel-spectrograms are input to a convolutional neural network, and a feature engineering approach, where a collection of hand-crafted features is input to a gradient boosting machine. We first show that both methods provide complementary information to some extent. Then, we use a simple late fusion strategy to combine both methods. We report classification accuracy of each method individually and the combined system on the TUT Acoustic Scenes 2017 dataset. The proposed fused system outperforms each of the individual methods and attains a classification accuracy of 72.8% on the evaluation set, improving the baseline system by 11.8%.

classification, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1806.07506

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.93)
Media > Music (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback