AITopics | Morocutti, Tobias

Collaborating Authors

Morocutti, Tobias

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring Performance-Complexity Trade-Offs in Sound Event Detection

Morocutti, Tobias, Schmid, Florian, Greif, Jonathan, Foscarin, Francesco, Widmer, Gerhard

arXiv.org Artificial IntelligenceMar-14-2025

We target the problem of developing new low-complexity networks for the sound event detection task. Our goal is to meticulously analyze the performance-complexity trade-off, aiming to be competitive with the large state-of-the-art models, at a fraction of the computational requirements. We find that low-complexity convolutional models previously proposed for audio tagging can be effectively adapted for event detection (which requires frame-wise prediction) by adjusting convolutional strides, removing the global pooling, and, importantly, adding a sequence model before the (now frame-wise) classification heads. Systematic experiments reveal that the best choice for the sequence model type depends on which complexity metric is most important for the given application. We also investigate the impact of enhanced training strategies such as knowledge distillation. In the end, we show that combined with an optimized training strategy, we can reach event detection performance comparable to state-of-the-art transformers while requiring only around 5% of the parameters. We release all our pre-trained models and the code for reproducing this work to support future research in low-complexity sound event detection at https://github.com/theMoro/EfficientSED.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.11373

Country: Europe > Austria > Upper Austria (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

Creating a Good Teacher for Knowledge Distillation in Acoustic Scene Classification

Morocutti, Tobias, Schmid, Florian, Koutini, Khaled, Widmer, Gerhard

arXiv.org Artificial IntelligenceMar-14-2025

The DCASE23 challenge's [1] Low-Complexity Acoustic Scene Classificat ion task focuses on utilizing the TAU Urban Acoustic Scenes 2022 Mobile development dataset (TAU22) [2]. This dataset comprises one-second audio snippets from ten distinct acoustic scenes. In an attempt to make the models deployable on edge devices, a comple xity limit on the models is enforced: models are constrained to ha ve no more than 128,000 parameters and 30 million multiply-accum ulate operations (MMACs) for the inference of a 1-second audio sni p-pet. Among other model compression techniques such as Quantization [3] and Pruning [4], Knowledge Distillation (KD) [ 5-7] proved to be a particularly well-suited technique to improv e the performance of a low-complexity model in ASC. In a standard KD setting, a low-complexity model learns to mimic the teacher by minimizing a weighted sum of hard label l oss and distillation loss. While the soft targets are usually ob tained by one or multiple possibly complex teacher models, the distil lation loss tries to match the student predictions with the compute d soft targets based on the Kullback-Leibler divergence. Jung et al. [8] demonstrate that soft targets in a teacher-st udent setup benefit the learning process since one-hot labels do no t reflect the blurred decision boundaries between different acousti c scenes. Knowledge distillation has also been a very popular method i n the DCASE challenge submissions.

artificial intelligence, machine learning, student, (14 more...)

arXiv.org Artificial Intelligence

2503.11363

Country:

Europe > Finland (0.15)
Europe > Austria (0.14)

Genre: Research Report (0.64)

Industry: Education (0.97)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Device-Robust Acoustic Scene Classification via Impulse Response Augmentation

Morocutti, Tobias, Schmid, Florian, Koutini, Khaled, Widmer, Gerhard

arXiv.org Artificial IntelligenceJun-27-2023

The ability to generalize to a wide range of recording devices is a crucial performance factor for audio classification models. The characteristics of different types of microphones introduce distributional shifts in the digitized audio signals due to their varying frequency responses. If this domain shift is not taken into account during training, the model's performance could degrade severely when it is applied to signals recorded by unseen devices. In particular, training a model on audio signals recorded with a small number of different microphones can make generalization to unseen devices difficult. To tackle this problem, we convolve audio signals in the training set with pre-recorded device impulse responses (DIRs) to artificially increase the diversity of recording devices. We systematically study the effect of DIR augmentation on the task of Acoustic Scene Classification using CNNs and Audio Spectrogram Transformers. The results show that DIR augmentation in isolation performs similarly to the state-of-the-art method Freq-MixStyle. However, we also show that DIR augmentation and Freq-MixStyle are complementary, achieving a new state-of-the-art performance on signals recorded by devices unseen during training.

accuracy, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2305.07499

Country: Europe > Austria (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)

Add feedback