Effective Audio Classification Network Based on Paired Inverse Pyramid Structure and Dense MLP Block

Chen, Yunhao, Zhu, Yunjie, Yan, Zihui, Huang, Yifan, Ren, Zhen, Shen, Jianlu, Chen, Lifang

May-30-2023–arXiv.org Artificial Intelligence

Recently, massive architectures based on Convolutional Neural Network (CNN) and self-attention mechanisms have become necessary for audio classification. While these techniques are state-of-the-art, these works' effectiveness can only be guaranteed with huge computational costs and parameters, large amounts of data augmentation, transfer from large datasets and some other tricks. By utilizing the lightweight nature of audio, we propose an efficient network structure called Paired Inverse Pyramid Structure (PIP) and a network called Paired Inverse Pyramid Structure MLP Network (PIPMN) to overcome these problems. The PIPMN reaches 95.5% of Environmental Sound Classification (ESC) accuracy on the UrbanSound8K dataset and 93.2% of Music Genre Classification (MGC) on the GTAZN dataset, with only 1 million parameters. Both of the results are achieved without data augmentation or transfer learning. The PIPMN can achieve similar or even exceeds other state-ofthe-art models with much less parameters under this setting. The Code is available on the https://github.com/JNAIC/PIPMN Keywords: audio classification multi-stage structure skip connection multi-layer perceptron (MLP).

artificial intelligence, information, machine learning, (16 more...)

arXiv.org Artificial Intelligence

May-30-2023

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom (0.14)
- North America > United States (0.04)
- Asia > China (0.04)

Genre:
- Research Report (1.00)

Industry:
- Media > Music (1.00)
- Leisure & Entertainment (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found