AAT: Adapting Audio Transformer for Various Acoustics Recognition Tasks

Liang, Yun, Lin, Hai, Qiu, Shaojian, Zhang, Yihang

Jan-19-2024–arXiv.org Artificial Intelligence

Recently, Transformers have been introduced into the field of acoustics recognition. They are pre-trained on large-scale datasets using methods such as supervised learning and semi-supervised learning, demonstrating robust generality--It fine-tunes easily to downstream tasks and shows more robust performance. However, the predominant fine-tuning method currently used is still full fine-tuning, which involves updating all parameters during training. This not only incurs significant memory usage and time costs but also compromises the model's generality. Other fine-tuning methods either struggle to address this issue or fail to achieve matching performance. Therefore, we conducted a comprehensive analysis of existing fine-tuning methods and proposed an efficient fine-tuning approach based on Adapter tuning, namely AAT. The core idea is to freeze the audio Transformer model and insert extra learnable Adapters, efficiently acquiring downstream task knowledge without compromising the model's original generality. Extensive experiments have shown that our method achieves performance comparable to or even superior to full fine-tuning while optimizing only 7.118% of the parameters. It also demonstrates superiority over other fine-tuning methods.

adapter, fine-tuning, transformer, (16 more...)

arXiv.org Artificial Intelligence

Jan-19-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
- Europe > Romania
  - Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > China
  - Guangdong Province > Guangzhou (0.05)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)