AITopics | acoustic scene classification

Collaborating Authors

acoustic scene classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Lightweight and Generalizable Acoustic Scene Representations via Contrastive Fine-Tuning and Distillation

Yuan, Kuang, Gao, Yang, Li, Xilin, Mei, Xinhao, Zadissa, Syavosh, Pruthi, Tarun, Sereshki, Saeed Bagheri

arXiv.org Artificial IntelligenceOct-7-2025

ABSTRACT Acoustic scene classification (ASC) models on edge devices typically operate under fixed class assumptions, lacking the transferability needed for real-world applications that require adaptation to new or refined acoustic categories. We propose ContrastASC, which learns generalizable acoustic scene representations by structuring the embedding space to preserve semantic relationships between scenes, enabling adaptation to unseen categories without retraining. Our approach combines supervised contrastive fine-tuning of pre-trained models with contrastive representation distillation to transfer this structured knowledge to compact student models. Our evaluation shows that ContrastASC demonstrates improved few-shot adaptation to unseen categories while maintaining strong closed-set performance. Index T erms-- Acoustic Scene Classification, Contrastive Learning, Knowledge Distillation, Model Fine-tuning 1. INTRODUCTION Acoustic scene classification (ASC) has attracted significant research attention as a crucial capability for context-aware AI systems on edge devices [1, 2].

acoustic scene classification, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2510.03728

Genre: Research Report (0.64)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

An Entropy-Guided Curriculum Learning Strategy for Data-Efficient Acoustic Scene Classification under Domain Shift

Zhang, Peihong, Liu, Yuxuan, Li, Zhixin, Sang, Rui, Cai, Yiqiang, Tan, Yizhou, Li, Shengchen

arXiv.org Artificial IntelligenceSep-16-2025

Acoustic Scene Classification (ASC) faces challenges in generalizing across recording devices, particularly when labeled data is limited. The DCASE 2024 Challenge Task 1 highlights this issue by requiring models to learn from small labeled subsets recorded on a few devices. These models need to then generalize to recordings from previously unseen devices under strict complexity constraints. While techniques such as data augmentation and the use of pre-trained models are well-established for improving model generalization, optimizing the training strategy represents a complementary yet less-explored path that introduces no additional architectural complexity or inference overhead. Among various training strategies, curriculum learning offers a promising paradigm by structuring the learning process from easier to harder examples. In this work, we propose an entropy-guided curriculum learning strategy to address the domain shift problem in data-efficient ASC. Specifically, we quantify the uncertainty of device domain predictions for each training sample by computing the Shannon entropy of the device posterior probabilities estimated by an auxiliary domain classifier. Using entropy as a proxy for domain invariance, the curriculum begins with high-entropy samples and gradually incorporates low-entropy, domain-specific ones to facilitate the learning of generalizable representations. Experimental results on multiple DCASE 2024 ASC baselines demonstrate that our strategy effectively mitigates domain shift, particularly under limited labeled data conditions. Our strategy is architecture-agnostic and introduces no additional inference cost, making it easily integrable into existing ASC baselines and offering a practical solution to domain shift.

artificial intelligence, generalization, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2509.11168

Country: Europe > Spain (0.16)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Adaptive Knowledge Distillation using a Device-Aware Teacher for Low-Complexity Acoustic Scene Classification

Jeong, Seung Gyu, Kim, Seong Eun

arXiv.org Artificial IntelligenceSep-12-2025

In this technical report, we describe our submission for Task 1, Low-Complexity Device-Robust Acoustic Scene Classification, of the DCASE 2025 Challenge. Our work tackles the dual challenges of strict complexity constraints and robust generalization to both seen and unseen devices, while also leveraging the new rule allowing the use of device labels at test time. Our proposed system is based on a knowledge distillation framework where an efficient CP-MobileNet student learns from a compact, specialized two-teacher ensemble. This ensemble combines a baseline PaSST teacher, trained with standard cross-entropy, and a 'generalization expert' teacher. This expert is trained using our novel Device-Aware Feature Alignment (DAFA) loss, adapted from prior work, which explicitly structures the feature space for device robustness. To capitalize on the availability of test-time device labels, the distilled student model then undergoes a final device-specific fine-tuning stage. Our proposed system achieves a final accuracy of 57.93\% on the development set, demonstrating a significant improvement over the official baseline, particularly on unseen devices.

artificial intelligence, classification, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2509.09262

Country: Asia > South Korea (0.14)

Genre: Research Report (0.41)

Industry: Education (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Quantum-Inspired Genetic Algorithm for Robust Source Separation in Smart City Acoustics

Quan, Minh K., Wijayasundara, Mayuri, Setunge, Sujeeva, Pathirana, Pubudu N.

arXiv.org Artificial IntelligenceApr-11-2025

The cacophony of urban sounds presents a significant challenge for smart city applications that rely on accurate acoustic scene analysis. Effectively analyzing these complex soundscapes, often characterized by overlapping sound sources, diverse acoustic events, and unpredictable noise levels, requires precise source separation. This task becomes more complicated when only limited training data is available. This paper introduces a novel Quantum-Inspired Genetic Algorithm (p-QIGA) for source separation, drawing inspiration from quantum information theory to enhance acoustic scene analysis in smart cities. By leveraging quantum superposition for efficient solution space exploration and entanglement to handle correlated sources, p-QIGA achieves robust separation even with limited data. These quantum-inspired concepts are integrated into a genetic algorithm framework to optimize source separation parameters. The effectiveness of our approach is demonstrated on two datasets: the TAU Urban Acoustic Scenes 2020 Mobile dataset, representing typical urban soundscapes, and the Silent Cities dataset, capturing quieter urban environments during the COVID-19 pandemic. Experimental results show that the p-QIGA achieves accuracy comparable to state-of-the-art methods while exhibiting superior resilience to noise and limited training data, achieving up to 8.2 dB signal-to-distortion ratio (SDR) in noisy environments and outperforming baseline methods by up to 2 dB with only 10% of the training data. This research highlights the potential of p-QIGA to advance acoustic signal processing in smart cities, particularly for noise pollution monitoring and acoustic surveillance.

artificial intelligence, evolutionary algorithm, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2504.07345

Country: Oceania > Australia (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.76)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Creating a Good Teacher for Knowledge Distillation in Acoustic Scene Classification

Morocutti, Tobias, Schmid, Florian, Koutini, Khaled, Widmer, Gerhard

arXiv.org Artificial IntelligenceMar-14-2025

The DCASE23 challenge's [1] Low-Complexity Acoustic Scene Classificat ion task focuses on utilizing the TAU Urban Acoustic Scenes 2022 Mobile development dataset (TAU22) [2]. This dataset comprises one-second audio snippets from ten distinct acoustic scenes. In an attempt to make the models deployable on edge devices, a comple xity limit on the models is enforced: models are constrained to ha ve no more than 128,000 parameters and 30 million multiply-accum ulate operations (MMACs) for the inference of a 1-second audio sni p-pet. Among other model compression techniques such as Quantization [3] and Pruning [4], Knowledge Distillation (KD) [ 5-7] proved to be a particularly well-suited technique to improv e the performance of a low-complexity model in ASC. In a standard KD setting, a low-complexity model learns to mimic the teacher by minimizing a weighted sum of hard label l oss and distillation loss. While the soft targets are usually ob tained by one or multiple possibly complex teacher models, the distil lation loss tries to match the student predictions with the compute d soft targets based on the Kullback-Leibler divergence. Jung et al. [8] demonstrate that soft targets in a teacher-st udent setup benefit the learning process since one-hot labels do no t reflect the blurred decision boundaries between different acousti c scenes. Knowledge distillation has also been a very popular method i n the DCASE challenge submissions.

artificial intelligence, machine learning, student, (14 more...)

arXiv.org Artificial Intelligence

2503.11363

Country:

Europe > Finland > Pirkanmaa > Tampere (0.05)
Europe > Austria > Upper Austria > Linz (0.04)

Genre: Research Report (0.64)

Industry: Education (0.97)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Quantum-Enhanced Transformers for Robust Acoustic Scene Classification in IoT Environments

Quan, Minh K., Wijayasundara, Mayuri, Setunge, Sujeeva, Pathirana, Pubudu N.

arXiv.org Artificial IntelligenceJan-16-2025

The proliferation of Internet of Things (IoT) devices equipped with acoustic sensors necessitates robust acoustic scene classification (ASC) capabilities, even in noisy and data-limited environments. Traditional machine learning methods often struggle to generalize effectively under such conditions. To address this, we introduce Q-ASC, a novel Quantum-Inspired Acoustic Scene Classifier that leverages the power of quantum-inspired transformers. By integrating quantum concepts like superposition and entanglement, Q-ASC achieves superior feature learning and enhanced noise resilience compared to classical models. Furthermore, we introduce a Quantum Variational Autoencoder (QVAE) based data augmentation technique to mitigate the challenge of limited labeled data in IoT deployments. Extensive evaluations on the Tampere University of Technology (TUT) Acoustic Scenes 2016 benchmark dataset demonstrate that Q-ASC achieves remarkable accuracy between 68.3% and 88.5% under challenging conditions, outperforming state-of-the-art methods by over 5% in the best case. This research paves the way for deploying intelligent acoustic sensing in IoT networks, with potential applications in smart homes, industrial monitoring, and environmental surveillance, even in adverse acoustic environments.

q-asc, qubit, scene classification, (11 more...)

arXiv.org Artificial Intelligence

2501.09394

Country:

Europe > Finland > Pirkanmaa > Tampere (0.24)
Oceania > Australia > Victoria > Melbourne (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Smart Houses & Appliances (0.54)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data Efficient Acoustic Scene Classification using Teacher-Informed Confusing Class Instruction

Yeo, Jin Jie Sean, Tan, Ee-Leng, Bai, Jisheng, Peksi, Santi, Gan, Woon-Seng

arXiv.org Artificial IntelligenceSep-18-2024

In this technical report, we describe the SNTL-NTU team's submission for Task 1 Data-Efficient Low-Complexity Acoustic Scene Classification of the detection and classification of acoustic scenes and events (DCASE) 2024 challenge. Three systems are introduced to tackle training splits of different sizes. For small training splits, we explored reducing the complexity of the provided baseline model by reducing the number of base channels. We introduce data augmentation in the form of mixup to increase the diversity of training samples. For the larger training splits, we use FocusNet to provide confusing class information to an ensemble of multiple Patchout faSt Spectrogram Transformer (PaSST) models and baseline models trained on the original sampling rate of 44.1 kHz. We use Knowledge Distillation to distill the ensemble model to the baseline student model. Training the systems on the TAU Urban Acoustic Scene 2022 Mobile development dataset yielded the highest average testing accuracy of (62.21, 59.82, 56.81, 53.03, 47.97)% on split (100, 50, 25, 10, 5)% respectively over the three systems.

classification, teacher model, training split, (13 more...)

arXiv.org Artificial Intelligence

2409.11964

Country:

Asia > Singapore (0.05)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (0.42)

Industry: Education (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Deep Space Separable Distillation for Lightweight Acoustic Scene Classification

Ye, ShuQi, Tian, Yuan

arXiv.org Artificial IntelligenceMay-6-2024

Acoustic scene classification (ASC) is highly important in the real world. Recently, deep learning-based methods have been widely employed for acoustic scene classification. However, these methods are currently not lightweight enough as well as their performance is not satisfactory. To solve these problems, we propose a deep space separable distillation network. Firstly, the network performs high-low frequency decomposition on the log-mel spectrogram, significantly reducing computational complexity while maintaining model performance. Secondly, we specially design three lightweight operators for ASC, including Separable Convolution (SC), Orthonormal Separable Convolution (OSC), and Separable Partial Convolution (SPC). These operators exhibit highly efficient feature extraction capabilities in acoustic scene classification tasks. The experimental results demonstrate that the proposed method achieves a performance gain of 9.8% compared to the currently popular deep learning methods, while also having smaller parameter count and computational complexity.

artificial intelligence, convolution, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.03567

Country:

Europe (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

Bai, Jisheng, Wang, Mou, Liu, Haohe, Yin, Han, Jia, Yafei, Huang, Siwei, Du, Yutong, Zhang, Dongzhe, Plumbley, Mark D., Shi, Dongyuan, Gan, Woon-Seng, Rahardja, Susanto, Xiang, Bin, Chen, Jianfeng

arXiv.org Artificial IntelligenceFeb-4-2024

Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is domain shift caused by a distribution gap between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Although this task in recent years has achieved substantial progress in device generalization, the challenge of domain shift between different regions, involving characteristics such as time, space, culture, and language, remains insufficiently explored at present. In addition, considering the abundance of unlabeled acoustic scene data in the real world, it is important to study the possible ways to utilize these unlabelled data. Therefore, we introduce the task Semi-supervised Acoustic Scene Classification under Domain Shift in the ICME 2024 Grand Challenge. We encourage participants to innovate with semi-supervised learning techniques, aiming to develop more robust ASC models under domain shift.

classification, dataset, scene classification, (13 more...)

arXiv.org Artificial Intelligence

2402.02694

Country:

Asia > China > Shaanxi Province > Xi'an (0.05)
Europe > United Kingdom > England > Surrey (0.04)
Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)

Add feedback

Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification

Milling, Manuel, Triantafyllopoulos, Andreas, Tsangko, Iosif, Rampp, Simon David Noel, Schuller, Björn Wolfgang

arXiv.org Artificial IntelligenceJan-15-2024

Intuitively, these terms are related in the context of deep neural networks has been subject to the Hessian matrix, which contains all second-order derivatives, at to discussion for a long time. Whilst mostly investigated in the a given point of a function, for all directions and can thus represent context of selected benchmark data sets in the area of computer vision, the local curvature behaviour of the function. Yet, an undisputed definition we explore this aspect for the acoustic scene classification task of flatness and sharpness in the high-dimensional parameter of the DCASE2020 challenge data. Our analysis is based on twodimensional space of ANNs is still lacking. Nevertheless, several approaches to filter-normalised visualisations and a derived sharpness quantify flatness and sharpness have been developed over the years, measure. Our exploratory analysis shows that sharper minima tend but they have failed to paint a complete picture of the generalisation to show better generalisation than flat minima -even more so for capabilities based on geometry, as a universal correlation between out-of-domain data, recorded from previously unseen devices-, thus flatness and generalisation has been disputed [6, 7].

generalisation, minima, sharpness, (14 more...)

arXiv.org Artificial Intelligence

2309.16369

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Slovakia > Bratislava > Bratislava (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback